Papers
Topics
Authors
Recent
2000 character limit reached

A new parsimonious method for classifying Cancer Tissue-of-Origin Based on DNA Methylation 450K data

Published 3 Jan 2021 in q-bio.TO | (2101.00570v1)

Abstract: DNA methylation is a well-studied genetic modification that regulates gene transcription of Eukaryotes. Its alternations have been recognized as a significant component of cancer development. In this study, we use the DNA methylation 450k data from The Cancer Genome Atlas to evaluate the efficacy of DNA methylation data on cancer classification for 30 cancer types. We propose a new method for gene selection in high dimensional data(over 450 thousand). Variance filtering is first introduced for dimension reduction and Recursive feature elimination (RFE) is then used for feature selection. We address the problem of selecting a small subsets of genes from large number of methylated sites, and our parsimonious model is demonstrated to be efficient, achieving an accuracy over 91%, outperforming other studies which use DNA micro-arrays and RNA-seq Data . The performance of 20 models, which are based on 4 estimators (Random Forest, Decision Tree, Extra Tree and Support Vector Machine) and 5 classifiers (k-Nearest Neighbours, Support Vector Machine, XGboost, Light GBM and Multi-Layer Perceptron), is compared and robustness of the RFE algorithm is examined. Results suggest that the combined model of extra tree plus catboost classifier offers the best performance in cancer identification, with an overall validation accuracy of 91% , 92.3%, 93.3% and 93.5% for 20, 30, 40 and 50 features respectively. The biological functions in cancer development of 50 selected genes is also explored through enrichment analysis and the results show that 12 out of 16 of our top features have already been identified to be specific with cancer and we also propose some more genes to be tested for future studies. Therefore, our method may be utilzed as an auxiliary diagnostic method to determine the actual clinicopathological status of a specific cancer.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.