Hi. What is the time for sorting a 100MB file?

2015-11-12T02:08:02.474-08:00

In fact, I have a blog post about this here: http:...

2015-02-10T15:23:54.014-08:00

In fact, I have a blog post about this here: http://nathanhaigh.github.io/linux/2014/11/14/Unix-paste/

You can do away with the two Perl scripts and use ...

2015-02-10T15:22:19.563-08:00

You can do away with the two Perl scripts and use the Unix commands "paste" and "tr" instead. It would work like this:

paste - - - - < my.fastq | sort --stable -t $'\t' -k2,2 | tr '\t' '\n'

The "paste" will put the 4 lines of each FASTQ record into a 4-column tab-delimited format. The "tr" converts tabs back to newlines and the standard FASTQ format. These will be much, much faster than your Perl script.

In addition, your version of "sort" may support parallelisation and the allocation of more memory per process e.g. on my big-memory machine with 64 cores I could do this:

paste - - - - < my.fastq | sort --parallel 20 --buffer-size 5G --stable -t $'\t' -k2,2 | tr '\t' '\n'

Comments on Bridgecrest Bioinformatics: Sort FASTQ file by sequence

Hi. What is the time for sorting a 100MB file?

In fact, I have a blog post about this here: http:...

You can do away with the two Perl scripts and use ...