Mutual information-based supervised attribute clustering for microarray sample classification
P Maji - IEEE Transactions on Knowledge and Data …, 2010 - ieeexplore.ieee.org
IEEE Transactions on Knowledge and Data Engineering, 2010•ieeexplore.ieee.org
Microarray technology is one of the important biotechnological means that allows to record
the expression levels of thousands of genes simultaneously within a number of different
samples. An important application of microarray gene expression data in functional
genomics is to classify samples according to their gene expression profiles. Among the large
amount of genes presented in gene expression data, only a small fraction of them is effective
for performing a certain diagnostic test. Hence, one of the major tasks with the gene …
the expression levels of thousands of genes simultaneously within a number of different
samples. An important application of microarray gene expression data in functional
genomics is to classify samples according to their gene expression profiles. Among the large
amount of genes presented in gene expression data, only a small fraction of them is effective
for performing a certain diagnostic test. Hence, one of the major tasks with the gene …
Microarray technology is one of the important biotechnological means that allows to record the expression levels of thousands of genes simultaneously within a number of different samples. An important application of microarray gene expression data in functional genomics is to classify samples according to their gene expression profiles. Among the large amount of genes presented in gene expression data, only a small fraction of them is effective for performing a certain diagnostic test. Hence, one of the major tasks with the gene expression data is to find groups of coregulated genes whose collective expression is strongly associated with the sample categories or response variables. In this regard, a new supervised attribute clustering algorithm is proposed to find such groups of genes. It directly incorporates the information of sample categories into the attribute clustering process. A new quantitative measure, based on mutual information, is introduced that incorporates the information of sample categories to measure the similarity between attributes. The proposed supervised attribute clustering algorithm is based on measuring the similarity between attributes using the new quantitative measure, whereby redundancy among the attributes is removed. The clusters are then refined incrementally based on sample categories. The performance of the proposed algorithm is compared with that of existing supervised and unsupervised gene clustering and gene selection algorithms based on the class separability index and the predictive accuracy of naive bayes classifier, K-nearest neighbor rule, and support vector machine on three cancer and two arthritis microarray data sets. The biological significance of the generated clusters is interpreted using the gene ontology. An important finding is that the proposed supervised attribute clustering algorithm is shown to be effective for identifying biologically significant gene clusters with excellent predictive capability.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果