Avtor/Urednik     Kastrin, Andrej; Hristovski, Dimitar
Naslov     A Fast document classification algorithm for gene symbol disambiguation in the BITOLA literature-based discovery support system
Tip     članek
Vir     In: Supermondt J, Evans SR, Ohno-Machado L, editors. Biomedical and health informatics: from foundations to applications to policy. AMIA 2008 annual symposium proceedings; 2008 Nov 8-12; Washington. Washington: American medical informatics assocition,
Leto izdaje     2008
Obseg     str. 358-62
Jezik     eng
Abstrakt     Gene symbol disambiguation is an important problem for biomedical text mining systems. When detecting gene symbols in MEDLINE(R) citations one of the biggest challenges is the fact that many gene symbols also denote other, more general biomedical concepts (e.g. CT, MR). Our approach to this problem is first to classify the citations into genetic and non-genetic domains and then to detect gene symbols only in the genetic domain. We used ontological information provided by Medical Subject Headings (MeSH(R)) for this classification task. The proposed algorithm is fast and is able to process the full MEDLINE distribution in a few hours. It achieves predictive accuracy of 0.91. The algorithm is currently implemented in the BITOLA literature-based discovery support system (http://www.mf.uni-lj.si/bitola/).
Deskriptorji     GENES
MEDLINE
SUBJECT HEADINGS
VOCABULARY, CONTROLLED
NOMENCLATURE
ALGORITHMS