Author/Editor | Kastrin, Andrej; Rindflesch, Thomas C.; Hristovski, Dimitar | |
Title | Link prediction on a network of co-occurring MeSH terms | |
Type | članek | |
Vol. and No. | Letnik 55, št. 4 | |
Publication year | 2016 | |
Volume | str. 340-346 | |
ISSN | 0026-1270 - Methods of information in medicine | |
Language | eng | |
Abstract | Objectives: Literature-based discovery (LBD) is a text mining methodology for automatically generating research hypotheses from existing knowledge. We mimic the process of LBD as a classification problem on a graph of MeSH terms. We employ unsupervised and supervised link prediction methods for predicting previously unknown connections between biomedical concepts. Methods: We evaluate the effectiveness of link prediction through a series of experiments using a MeSH network that contains the history of link formation between biomedical concepts. We performed link prediction using proximity measures, such as common neighbor (CN), Jaccard coefficient (JC), Adamic/Adar index (AA) and preferential attachment (PA). Our approach relies on the assumption that similar nodes are more likely to establish a link in the future. Results: Applying an unsupervised approach, the AA measure achieved the best performance in terms of area under the ROC curve (AUC=0.76), followed by CN, JC, and PA. In a supervised approach, we evaluate whether proximity measures can be combined to define a model of link formation across all four predictors. We applied various classifiers, including decision trees, k-nearest neighbors, logistic regression, multilayer perceptron, naïve Bayes, and random forests. Random forest classifier accomplishes the best performance (AUC=0.87). Conclusions: The link prediction approach proved to be effective for LBD processing. Supervised statistical learning approaches clearly outperform an unsupervised approach to link prediction. | |
Keywords | network analysis prediction literature analiza omrežij napoved literatura |