Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner (2404.04547v1)

Published 6 Apr 2024 in cs.NE, cs.AI, and cs.LG

Abstract: Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search and poor generalization across diverse datasets. This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data. The EODE methodology combines an intelligent grey wolf optimization algorithm for selective feature space reduction, guided random injection modeling for ensemble diversity enhancement, and subset model optimization for synergistic classifier combinations. Extensive experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types. Results demonstrated that EODE obtained significantly improved screening accuracy over individual and conventionally aggregated models. The integrated optimization of advanced feature selection, directed specialized modeling, and cooperative classifier ensembles helps address key challenges in current nature-inspired approaches. This provides an effective framework for robust and generalized ensemble learning with gene expression biomarkers. Specifically, we have opened EODE source code on Github at https://github.com/wangxb96/EODE.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Changing profiles of cancer burden worldwide and in china: a secondary analysis of the global cancer statistics 2020. Chinese Medical Journal, 134(07):783–791, 2021.
  2. From patterns to patients: Advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell, 2023.
  3. Artificial intelligence in cancer imaging: clinical challenges and applications. CA: a cancer journal for clinicians, 69(2):127–157, 2019.
  4. Delivering precision oncology to patients with cancer. Nature Medicine, 28(4):658–665, 2022.
  5. Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics, 22(15):1855–1862, 2006.
  6. Identification of expression signatures for non-small-cell lung carcinoma subtype classification. Bioinformatics, 36(2):339–346, 2020.
  7. Improving feature selection performance for classification of gene expression data using harris hawks optimizer with variable neighborhood learning. Briefings in Bioinformatics, 2021.
  8. Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction. Bioinformatics, 30(19):2757–2763, 2014.
  9. An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets. Bioinformatics, 34(17):i908–i916, 2018.
  10. Ecmarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages. Bioinformatics, 37(8):1115–1124, 2021.
  11. Integrating spatial gene expression and breast tumour morphology via deep learning. Nature Biomedical Engineering, 4(8):827–834, 2020.
  12. Exploring cancer biomarker genes from gene expression data via natureinspired multiobjective optimization. In 2022 34th Chinese Control and Decision Conference (CCDC), pages 5000–5007. IEEE, 2022.
  13. A feature weighting particle swarm optimization method to identify biomarker genes. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 830–834. IEEE, 2022.
  14. A novel aco–ga hybrid algorithm for feature selection in protein function prediction. Expert systems with applications, 36(10):12086–12094, 2009.
  15. A k-nn method for lung cancer prognosis with the use of a genetic algorithm for feature selection. Expert Systems with Applications, 164:113981, 2021.
  16. Binary coyote optimization algorithm for feature selection. Pattern Recognition, 107:107470, 2020.
  17. Bepo: a novel binary emperor penguin optimizer for automatic feature selection. Knowledge-Based Systems, 211:106560, 2021.
  18. An improved dragonfly algorithm for feature selection. Knowledge-Based Systems, 203:106131, 2020.
  19. An efficient henry gas solubility optimization for feature selection. Expert Systems with Applications, 152:113364, 2020.
  20. Evolving ensemble fuzzy classifier. IEEE Transactions on Fuzzy Systems, 26(5):2552–2567, 2018.
  21. Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
  22. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
  23. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics, 25(22):2906–2912, 2009.
  24. The lnclocator: a subcellular localization predictor for long non-coding rnas based on a stacked ensemble classifier. Bioinformatics, 34(13):2185–2194, 2018.
  25. Meta-gdbp: a high-level stacked regression model to improve anticancer drug response prediction. Briefings in Bioinformatics, 21(3):996–1005, 2020.
  26. Diversity creation methods: a survey and categorisation. Information Fusion, 6(1):5–20, 2005.
  27. A novel method for creating an optimized ensemble classifier by introducing cluster size reduction and diversity. IEEE Transactions on Knowledge and Data Engineering, 2020.
  28. Ensemble selection based on classifier prediction confidence. Pattern Recognition, 100:107104, 2020.
  29. Applying ant colony optimization to configuring stacking ensembles for data mining. Expert Systems with Applications, 41(6):2688–2702, 2014.
  30. Relevant feature selection and ensemble classifier design using bi-objective genetic algorithm. Knowledge and Information Systems, 62(2):423–455, 2020.
  31. Single-cell rna-seq interpretations using evolutionary multiobjective ensemble pruning. Bioinformatics, 35(16):2809–2817, 2019.
  32. The genetic algorithm-aided three-stage ensemble learning method identified a robust survival risk score in patients with glioma. Briefings in Bioinformatics, 23(5):bbac344, 2022.
  33. A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16–28, 2014.
  34. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51:181–207, 2003.
  35. Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 7(7), 2006.
  36. Lior Rokach. Ensemble-based classifiers. Artificial Intelligence Review, 33:1–39, 2010.
  37. Grey wolf optimizer. Advances in Engineering Software, 69:46–61, 2014.
  38. Grey wolf optimizer: a review of recent variants and applications. Neural Computing and Applications, 30(2):413–435, 2018.
  39. Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Transactions on Cybernetics, 43(6):1656–1671, 2012.
  40. Feature selection based on rough sets and particle swarm optimization. Pattern Recognition Letters, 28(4):459–471, 2007.
  41. Information Theory, Inference and Learning Algorithms. Cambridge university press, 2003.
  42. Convergence to equilibria in plurality voting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 24, 2010.
  43. Reshef Meir. Plurality voting under uncertainty. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
  44. Clustering cancer gene expression data: a comparative study. BMC Bioinformatics, 9(1):1–14, 2008.
  45. Peter A Lachenbruch and M Goldstein. Discriminant analysis. Biometrics, pages 69–85, 1979.
  46. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21(3):660–674, 1991.
  47. Naomi S Altman. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175–185, 1992.
  48. Bayya Yegnanarayana. Artificial Neural Networks. PHI Learning Pvt. Ltd., 2009.
  49. William S Noble. What is a support vector machine? Nature Biotechnology, 24(12):1565–1567, 2006.
  50. Kevin P Murphy et al. Naive bayes classifiers. University of British Columbia, 18(60):1–8, 2006.
  51. Ant colony optimization. IEEE Computational Intelligence Magazine, 1(4):28–39, 2006.
  52. Cuckoo search via lévy flights. In 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC), pages 210–214. Ieee, 2009.
  53. Differential evolution: A survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation, 15(1):4–31, 2010.
  54. Darrell Whitley. A genetic algorithm tutorial. Statistics and computing, 4(2):65–85, 1994.
  55. Particle swarm optimization. In Proceedings of ICNN’95-international Conference on Neural Networks, volume 4, pages 1942–1948. IEEE, 1995.
  56. A powerful and efficient algorithm for numerical function optimization: artificial bee colony (abc) algorithm. Journal of Global Optimization, 39(3):459–471, 2007.
  57. Evolutionary classifier and cluster selection approach for ensemble classification. ACM Transactions on Knowledge Discovery from Data (TKDD), 14(1):1–18, 2019.
  58. Muhammad Zohaib Jan. A Novel Framework for Optimised Ensemble Classifiers. PhD thesis, Central Queensland University, 2020.
  59. Accurate prediction of potential druggable proteins based on genetic algorithm and bagging-svm ensemble classifier. Artificial Intelligence in Medicine, 98:35–47, 2019.
  60. Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
  61. Rusboost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 40(1):185–197, 2009.
  62. Tin Kam Ho. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8):832–844, 1998.
  63. Totally corrective boosting algorithms that maximize the margin. In Proceedings of the 23rd International Conference on Machine Learning, pages 1001–1008, 2006.
  64. Boosting in the limit: Maximizing the margin of learned ensembles. In AAAI/IAAI, pages 692–699, 1998.

Summary

We haven't generated a summary for this paper yet.