Occasionally, I have come across sequencing read files (fastq) that are a bit screwy near the end and have what appears to be truncated text. This results in issues with programs used to analyze the data. Not sure if this was a problem with data transfer or what, but it is an easy fix just to trim off these troublesome reads and then the file is usable again (4 lines for fastq, 2 lines for fasta). This isn’t trivial with the giant sequencing files, which are impossible to just open in a text editor. Below I am giving a simple Unix command that will allow you to trim a set number of lines off from the end of a large text file. This is mostly so I can reference it easily later, but maybe it will help others as well.
Command to trim x number of lines from the end of a file:
$ head -n -<#lines> <inputfile> > <outputfile>
On a related note, you can also trim the first x number of lines from a file using this command:
$ tail -n +<#lines> <inputfile> > <outputfile>
Adapted from stackoverflow.com/questions/10460919/how-to-delete-first-two-lines-and-last-four-lines-from-a-test-file-with-bash
Edit: Just discovered that the above head command does not work on Apple OSX. The workaround for Mac users is as follows:
$ cat <inputfile> | tail -r | tail -<#lines> | tail -r > <outputfile>
#command essentially reverses the file, then trims the first x lines, then reverses it back to the original order, which equates to trimming the last x lines.