Informative gene selection and the direct classification of tumors based on relative simplicity
Background Selecting a parsimonious set of informative genes to build highly generalized
performance classifier is the most important task for the analysis of tumor microarray
expression data. Many existing gene pair evaluation methods cannot highlight diverse
patterns of gene pairs only used one strategy of vertical comparison and horizontal
comparison, while individual-gene-ranking method ignores redundancy and synergy among
genes. Results Here we proposed a novel score measure named relative simplicity (RS) …
performance classifier is the most important task for the analysis of tumor microarray
expression data. Many existing gene pair evaluation methods cannot highlight diverse
patterns of gene pairs only used one strategy of vertical comparison and horizontal
comparison, while individual-gene-ranking method ignores redundancy and synergy among
genes. Results Here we proposed a novel score measure named relative simplicity (RS) …
Background
Selecting a parsimonious set of informative genes to build highly generalized performance classifier is the most important task for the analysis of tumor microarray expression data. Many existing gene pair evaluation methods cannot highlight diverse patterns of gene pairs only used one strategy of vertical comparison and horizontal comparison, while individual-gene-ranking method ignores redundancy and synergy among genes.
Results
Here we proposed a novel score measure named relative simplicity (RS). We evaluated gene pairs according to integrating vertical comparison with horizontal comparison, finally built RS-based direct classifier (RS-based DC) based on a set of informative genes capable of binary discrimination with a paired votes strategy. Nine multi-class gene expression datasets involving human cancers were used to validate the performance of new method. Compared with the nine reference models, RS-based DC received the highest average independent test accuracy (91.40 %), the best generalization performance and the smallest informative average gene number (20.56). Compared with the four reference feature selection methods, RS also received the highest average test accuracy in three classifiers (Naïve Bayes, k-Nearest Neighbor and Support Vector Machine), and only RS can improve the performance of SVM.
Conclusions
Diverse patterns of gene pairs could be highlighted more fully while integrating vertical comparison with horizontal comparison strategy. DC core classifier can effectively control over-fitting. RS-based feature selection method combined with DC classifier can lead to more robust selection of informative genes and classification accuracy.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果