Sunday, February 5, 2012

Many times I find a neat or useful 1 or 2 line unix command, but I end up having to look it up again and again.  For this post, I plan to jot down a few commands that I think are helpful - I'll keep adding more as I come across them.

Reverse Complement a DNA sequence.
Let's say you have a file named sequence.txt that looks like this . . .


TCTTTCTCTGT
TGTGTCTCCAtg
tgtctctgtgcatgtctgtg
....

You can reverse complement it by doing this

tr -d '\n' < output.fa | rev | tr 'ACGTacgt' 'TGCAtgca' | fold -w 80 > output.txt

Remove Windows carriage return
tr -d '\r' < input.txt > output.txt
 Search folder and subfolders for files that contain the keyword "whatever".
find . | xargs grep whatever

Converting file to UTF-8 encoding

If you are piping a file through some Unix commands, and you get the error "Illegal byte sequence", you might try running your file through the iconv command.
iconv -f ISO-8859-1 -t UTF-8 input.txt

Sort lines by frequency

Say you have a list of terms in input.txt, and you want to see which are the most frequent:
sort input.txt | uniq -c | awk '{$1=$1};1' | sort -nrk1,1
This will sort and count the terms in the list, remove extra white-spaces, and sort based on the count from high to low.

2 comments:

  1. Justin, thanks for the shell tidbits, although I am both impressed and disturbed by the shell revcom :-)

    Does "grep -r ." do the same thing as your "find . | xargs grep" ?

    ReplyDelete
  2. Yes, it does. Thanks for pointing that out. Not sure what I was going for with that other way. Maybe that form would be more useful if one was looking for a file name within a folder hierarchy -something like "find . | grep ". Yeah, the reverse complement is a bit of a stretch, but at least reminds me of the rev and fold commands.

    ReplyDelete