Author/Editor     Kastrin, Andrej; Hristovski, Dimitar
Title     A Fast document classification algorithm for gene symbol disambiguation in the BITOLA literature-based discovery support system
Type     članek
Source     In: Supermondt J, Evans SR, Ohno-Machado L, editors. Biomedical and health informatics: from foundations to applications to policy. AMIA 2008 annual symposium proceedings; 2008 Nov 8-12; Washington. Washington: American medical informatics assocition,
Publication year     2008
Volume     str. 358-62
Language     eng
Abstract     Gene symbol disambiguation is an important problem for biomedical text mining systems. When detecting gene symbols in MEDLINE(R) citations one of the biggest challenges is the fact that many gene symbols also denote other, more general biomedical concepts (e.g. CT, MR). Our approach to this problem is first to classify the citations into genetic and non-genetic domains and then to detect gene symbols only in the genetic domain. We used ontological information provided by Medical Subject Headings (MeSH(R)) for this classification task. The proposed algorithm is fast and is able to process the full MEDLINE distribution in a few hours. It achieves predictive accuracy of 0.91. The algorithm is currently implemented in the BITOLA literature-based discovery support system (http://www.mf.uni-lj.si/bitola/).
Descriptors     GENES
MEDLINE
SUBJECT HEADINGS
VOCABULARY, CONTROLLED
NOMENCLATURE
ALGORITHMS