Bridgecrest Bioinformatics: February 2012

Many times I find a neat or useful 1 or 2 line unix command, but I end up having to look it up again and again. For this post, I plan to jot down a few commands that I think are helpful - I'll keep adding more as I come across them.

Reverse Complement a DNA sequence.
Let's say you have a file named sequence.txt that looks like this . . .

TCTTTCTCTGT
TGTGTCTCCAtg
tgtctctgtgcatgtctgtg
....

You can reverse complement it by doing this

tr -d '\n' < output.fa | rev | tr 'ACGTacgt' 'TGCAtgca' | fold -w 80 > output.txt

Remove Windows carriage return

tr -d '\r' < input.txt > output.txt

Search folder and subfolders for files that contain the keyword "whatever".

find . | xargs grep whatever

Converting file to UTF-8 encoding

If you are piping a file through some Unix commands, and you get the error "Illegal byte sequence", you might try running your file through the iconv command.

iconv -f ISO-8859-1 -t UTF-8 input.txt

Sort lines by frequency

Say you have a list of terms in input.txt, and you want to see which are the most frequent:

sort input.txt | uniq -c | awk '{$1=$1};1' | sort -nrk1,1

This will sort and count the terms in the list, remove extra white-spaces, and sort based on the count from high to low.

Sunday, February 5, 2012