Author/Editor     Kastrin, Andrej; Rindflesch, Thomas C.; Hristovski, Dimitar
Title     Link prediction on a network of co-occurring MeSH terms
Type     članek
Vol. and No.     Letnik 55, št. 4
Publication year     2016
Volume     str. 340-346
ISSN     0026-1270 - Methods of information in medicine
Language     eng
Abstract     Objectives: Literature-based discovery (LBD) is a text mining methodology for automatically generating research hypotheses from existing knowledge. We mimic the process of LBD as a classification problem on a graph of MeSH terms. We employ unsupervised and supervised link prediction methods for predicting previously unknown connections between biomedical concepts. Methods: We evaluate the effectiveness of link prediction through a series of experiments using a MeSH network that contains the history of link formation between biomedical concepts. We performed link prediction using proximity measures, such as common neighbor (CN), Jaccard coefficient (JC), Adamic/Adar index (AA) and preferential attachment (PA). Our approach relies on the assumption that similar nodes are more likely to establish a link in the future. Results: Applying an unsupervised approach, the AA measure achieved the best performance in terms of area under the ROC curve (AUC=0.76), followed by CN, JC, and PA. In a supervised approach, we evaluate whether proximity measures can be combined to define a model of link formation across all four predictors. We applied various classifiers, including decision trees, k-nearest neighbors, logistic regression, multilayer perceptron, naïve Bayes, and random forests. Random forest classifier accomplishes the best performance (AUC=0.87). Conclusions: The link prediction approach proved to be effective for LBD processing. Supervised statistical learning approaches clearly outperform an unsupervised approach to link prediction.
Keywords     network analysis
prediction
literature
analiza omrežij
napoved
literatura