Reverse Complement a DNA sequence.
Let's say you have a file named sequence.txt that looks like this . . .
TCTTTCTCTGT
TGTGTCTCCAtg
tgtctctgtgcatgtctgtg
....
You can reverse complement it by doing this
tr -d '\n' < output.fa | rev | tr 'ACGTacgt' 'TGCAtgca' | fold -w 80 > output.txt
Remove Windows carriage return
tr -d '\r' < input.txt > output.txtSearch folder and subfolders for files that contain the keyword "whatever".
find . | xargs grep whatever
Converting file to UTF-8 encoding
If you are piping a file through some Unix commands, and you get the error "Illegal byte sequence", you might try running your file through the iconv command.
iconv -f ISO-8859-1 -t UTF-8 input.txt
Sort lines by frequency
Say you have a list of terms in input.txt, and you want to see which are the most frequent:
sort input.txt | uniq -c | awk '{$1=$1};1' | sort -nrk1,1This will sort and count the terms in the list, remove extra white-spaces, and sort based on the count from high to low.
Justin, thanks for the shell tidbits, although I am both impressed and disturbed by the shell revcom :-)
ReplyDeleteDoes "grep -r ." do the same thing as your "find . | xargs grep" ?
Yes, it does. Thanks for pointing that out. Not sure what I was going for with that other way. Maybe that form would be more useful if one was looking for a file name within a folder hierarchy -something like "find . | grep ". Yeah, the reverse complement is a bit of a stretch, but at least reminds me of the rev and fold commands.
ReplyDelete