Informative Gene Selection and Direct Classification of Tumor Based on Chi‐Square Test of Pairwise Gene Interactions
H Zhang, L Li, C Luo, C Sun, Y Chen… - BioMed research …, 2014 - Wiley Online Library
BioMed research international, 2014•Wiley Online Library
In efforts to discover disease mechanisms and improve clinical diagnosis of tumors, it is
useful to mine profiles for informative genes with definite biological meanings and to build
robust classifiers with high precision. In this study, we developed a new method for tumor‐
gene selection, the Chi‐square test‐based integrated rank gene and direct classifier (χ2‐
IRG‐DC). First, we obtained the weighted integrated rank of gene importance from chi‐
square tests of single and pairwise gene interactions. Then, we sequentially introduced the …
useful to mine profiles for informative genes with definite biological meanings and to build
robust classifiers with high precision. In this study, we developed a new method for tumor‐
gene selection, the Chi‐square test‐based integrated rank gene and direct classifier (χ2‐
IRG‐DC). First, we obtained the weighted integrated rank of gene importance from chi‐
square tests of single and pairwise gene interactions. Then, we sequentially introduced the …
In efforts to discover disease mechanisms and improve clinical diagnosis of tumors, it is useful to mine profiles for informative genes with definite biological meanings and to build robust classifiers with high precision. In this study, we developed a new method for tumor‐gene selection, the Chi‐square test‐based integrated rank gene and direct classifier (χ2‐IRG‐DC). First, we obtained the weighted integrated rank of gene importance from chi‐square tests of single and pairwise gene interactions. Then, we sequentially introduced the ranked genes and removed redundant genes by using leave‐one‐out cross‐validation of the chi‐square test‐based Direct Classifier (χ2‐DC) within the training set to obtain informative genes. Finally, we determined the accuracy of independent test data by utilizing the genes obtained above with χ2‐DC. Furthermore, we analyzed the robustness of χ2‐IRG‐DC by comparing the generalization performance of different models, the efficiency of different feature‐selection methods, and the accuracy of different classifiers. An independent test of ten multiclass tumor gene‐expression datasets showed that χ2‐IRG‐DC could efficiently control overfitting and had higher generalization performance. The informative genes selected by χ2‐IRG‐DC could dramatically improve the independent test precision of other classifiers; meanwhile, the informative genes selected by other feature selection methods also had good performance in χ2‐DC.
Wiley Online Library