Now, I'm not the first to notice this, and there is actually a committee called the HUGO Gene Nomenclature Committee (HGNC) to "assign unique gene symbols and names to over 33,500 human loci". Too bad some of these symbols are utterly useless. The symbols may be unique with respect to other gene symbols, but they are far from unique and distinguishing.
Here are a few of my (least) favorites, there are numerous other examples:
KITEdit (Nov 5, 2012): Here are a few more wonderful examples
CAT
MAX
ACE
BAD
BID
LARGE
IMPACT
SET
REST
MET
PIGS
SHE
CAMP
PC
NODAL
COIL
CAST
COPE
POLE
CLOCK
ATM
RAN
CAPS
And the worse one ever, drumroll . . . .
T : Yes, there is a gene with the approved symbol of T (I pity the fool). Good luck finding any information about that gene.
Here is a breakdown of the lengths of the gene symbols:
# of Names Length of Name
1 1
31 2
615 3
3560 4
6296 5
4699 6
2468 7
1143 8
216 9
18 10
1 11
0 12
1 13
Why does this matter? There is too much information out there for a single person or army of people to sit down and wade through. There is more of a need for automated methods to assist in culling and processing the information. But when it is a challenging problem just to find the terms we are interested in, we are starting down a difficult road before we even get into the car.
Often times an abstract or paper will use the gene symbol rather than the full, laborious gene name, and these "official gene symbols" are too nondescript to be useful in an automated search. As the rate of information about genes out-paces our abilities to manually curate it, useful information might be lost or false conclusions may be drawn due to the ambiguity of our naming conventions.
No comments:
Post a Comment