Author/Editor     Blagus, Rok; Lusa, Lara
Title     Impact of class-imbalance on multi-class high-dimensional class prediction
Type     članek
Source     Metodol Zv Ljubl
Vol. and No.     Letnik 9, št. 1
Publication year     2012
Volume     str. 25-45
Language     eng
Abstract     The goal of multi-class supervised classification is to develop a rule that accurately predicts the class membership of new samples when the number of classes is larger than two. In this paper we consider high-dimensional class-imbalanced data: the number of variables greatly exceeds the number of samples and the number of samples in each class is not equal. We focus on Friedman 's one-versus-one approach for three-class problems and show how its class probabilities depend on the class probubilities from the binary classification sub-problems. We further explore its performance using diagonallinear discriminant analysis (DLDA) as a base classifier and compare its performance with multi-class DLDA, using simulated and real data. Our results show that the class-imbalance hus a significant effect on the classification results: the classification is biased towards the majority class as in the two-class problems and the problem is magnified when the number of variables is large. The amount of the bias depends also, jointly, on the magnitude of the differences between the classes and on the sample size: the bias diminishes when the difference between the classes is larger or the sample size is increased. Also variable selection plays an important role in the class-imbalance problem and the most effective strategy depends on the type of differences that exist between classes. DLDA seems to be among the least sensible classifiers to class-imbalance and its use is recommended also for multi-class problems. Whenever possible the experiments should be planned using balanced data in order to avoid the class-imbalance problem.
Descriptors     PROBABILITY
DISCRIMINANT ANALYSIS
COMPUTER SIMULATION
GENE EXPRESSION