Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 11 tok/s
GPT-5 High 14 tok/s Pro
GPT-4o 99 tok/s
GPT OSS 120B 462 tok/s Pro
Kimi K2 192 tok/s Pro
2000 character limit reached

Semantic-Preserving Feature Partitioning for Multi-View Ensemble Learning (2401.06251v1)

Published 11 Jan 2024 in cs.LG, cs.IT, and math.IT

Abstract: In machine learning, the exponential growth of data and the associated ``curse of dimensionality'' pose significant challenges, particularly with expansive yet sparse datasets. Addressing these challenges, multi-view ensemble learning (MEL) has emerged as a transformative approach, with feature partitioning (FP) playing a pivotal role in constructing artificial views for MEL. Our study introduces the Semantic-Preserving Feature Partitioning (SPFP) algorithm, a novel method grounded in information theory. The SPFP algorithm effectively partitions datasets into multiple semantically consistent views, enhancing the MEL process. Through extensive experiments on eight real-world datasets, ranging from high-dimensional with limited instances to low-dimensional with high instances, our method demonstrates notable efficacy. It maintains model accuracy while significantly improving uncertainty measures in scenarios where high generalization performance is achievable. Conversely, it retains uncertainty metrics while enhancing accuracy where high generalization accuracy is less attainable. An effect size analysis further reveals that the SPFP algorithm outperforms benchmark models by large effect size and reduces computational demands through effective dimensionality reduction. The substantial effect sizes observed in most experiments underscore the algorithm's significant improvements in model performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. J. Fan, F. Han, and H. Liu, “Challenges of big data analysis,” National Science Review, vol. 1, no. 2, pp. 293–314, 2014.
  2. S. Sagiroglu and D. Sinanc, “Big data: A review,” 2013, pp. 42–47.
  3. C.-W. Tsai, C.-F. Lai, M.-C. Chiang, and L. Yang, “Data mining for internet of things: A survey,” IEEE Communications Surveys and Tutorials, vol. 16, no. 1, pp. 77–97, 2014.
  4. M. Verleysen and D. François, “The curse of dimensionality in data mining and time series prediction,” vol. 3512, 2005, pp. 758–770.
  5. U. Shanthamallu, A. Spanias, C. Tepedelenlioglu, and M. Stanley, “A brief survey of machine learning methods and their sensor and iot applications,” vol. 2018-January, 2017, pp. 1–8.
  6. D. Donoho, “High-dimensional data analysis: The curses and blessings of dimensionality,” AMS Math Challenges Lecture, 2000.
  7. I. Johnstone and D. Titterington, “Statistical challenges of high-dimensional data,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 367, no. 1906, pp. 4237–4253, 2009.
  8. Y. Cui, D. Wu, and Y. Xu, “Curse of dimensionality for tsk fuzzy neural networks: Explanation and solutions,” vol. 2021-July, 2021.
  9. J. Li, P. Ozog, J. Abernethy, R. Eustice, and M. Johnson-Roberson, “Utilizing high-dimensional features for real-time robotic applications: Reducing the curse of dimensionality for recursive bayesian estimation,” vol. 2016-November, 2016, pp. 1230–1237.
  10. M. S. Khorshidi, D. Yazdani, J. Mańdziuk, M. R. Nikoo, and A. H. Gandomi, “A filter-based feature selection and ranking approach to enhance genetic programming for high-dimensional data analysis,” in 2023 IEEE Congress on Evolutionary Computation (CEC), 2023, pp. 1–9.
  11. H. Gharoun, N. Yazdanjoe, M. S. Khorshidi, and A. H. Gandomi, “Noise-augmented boruta: The neural network perturbation infusion with boruta feature selection,” 2023.
  12. B. Ghojogh, M. Samad, S. Mashhadi, T. Kapoor, W. Ali, F. Karray, and M. Crowley, “Feature selection and feature extraction in pattern analysis: A literature review,” Feature selection and feature extraction in pattern analysis: A literature review, 2019.
  13. M. Panda, A. A. A. Mousa, and A. E. Hassanien, “Developing an efficient feature engineering and machine learning model for detecting iot-botnet cyber attacks,” IEEE Access, vol. 9, pp. 91 038–91 052, 2021.
  14. C. Zhang, Y. Cui, Z. Han, J. T. Zhou, H. Fu, and Q. Hu, “Deep partial multi-view learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2402–2415, 2022.
  15. C. Xu, D. Tao, and C. Xu, “A survey on multi-view learning,” arXiv preprint arXiv:1304.5634, 2013.
  16. W. van Loon, M. Fokkema, B. Szabo, and M. de Rooij, “Stacked penalized logistic regression for selecting views in multi-view learning,” Information Fusion, vol. 61, pp. 113–123, 2020.
  17. A. Kumar and J. Yadav, “A review of feature set partitioning methods for multi-view ensemble learning,” Information Fusion, vol. 100, p. 101959, 2023.
  18. S. Sun, “Multi-view laplacian support vector machines,” in Advanced Data Mining and Applications: 7th International Conference, ADMA 2011, Beijing, China, December 17-19, 2011, Proceedings, Part II 7.   Springer, 2011, pp. 209–222.
  19. G. Fortino, S. Galzarano, R. Gravina, and W. Li, “A framework for collaborative computing and multi-sensor data fusion in body sensor networks,” Information Fusion, vol. 22, pp. 50–70, 2015.
  20. J. Flynn, M. Broxton, P. Debevec, M. Duvall, G. Fyffe, R. Overbeck, N. Snavely, and R. Tucker, “Deepview: View synthesis with learned gradient descent,” vol. 2019-June, 2019, pp. 2362–2371.
  21. J. Liu, X. Liu, Y. Yang, X. Guo, M. Kloft, and L. He, “Multiview subspace clustering via co-training robust data representation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 10, pp. 5177–5189, 2021.
  22. X. Jia, X.-Y. Jing, X. Zhu, S. Chen, B. Du, Z. Cai, Z. He, and D. Yue, “Semi-supervised multi-view deep discriminant representation learning,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 7, pp. 2496–2509, 2020.
  23. Q. Ye, P. Huang, Z. Zhang, Y. Zheng, L. Fu, and W. Yang, “Multiview learning with robust double-sided twin svm,” IEEE transactions on Cybernetics, vol. 52, no. 12, pp. 12 745–12 758, 2021.
  24. W. Di and M. Crawford, “View generation for multiview maximum disagreement based active learning for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 5 PART 2, pp. 1942–1954, 2012.
  25. M. Yang, C. Deng, and F. Nie, “Adaptive-weighting discriminative regression for multi-view classification,” Pattern Recognition, vol. 88, pp. 236–245, 2019.
  26. W. Liu, X. Yang, D. Tao, J. Cheng, and Y. Tang, “Multiview dimension reduction via hessian multiset canonical correlations,” Information Fusion, vol. 41, pp. 119–128, 2018.
  27. Y. Li, M. Yang, and Z. Zhang, “A survey of multi-view representation learning,” IEEE transactions on knowledge and data engineering, vol. 31, no. 10, pp. 1863–1883, 2018.
  28. J. Zhao, X. Xie, X. Xu, and S. Sun, “Multi-view learning overview: Recent progress and new challenges,” Information Fusion, vol. 38, pp. 43–54, 2017.
  29. J. Cai, J. Luo, S. Wang, and S. Yang, “Feature selection in machine learning: A new perspective,” Neurocomputing, vol. 300, pp. 70–79, 2018.
  30. T. K. Ho, “The random subspace method for constructing decision forests,” IEEE transactions on pattern analysis and machine intelligence, vol. 20, no. 8, pp. 832–844, 1998.
  31. R. Bryll, R. Gutierrez-Osuna, and F. Quek, “Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets,” Pattern recognition, vol. 36, no. 6, pp. 1291–1302, 2003.
  32. S. Guggari, V. Kadappa, and V. Umadevi, “Theme-based partitioning approach to decision tree: an extended experimental analysis,” in Emerging Research in Electronics, Computer Science and Technology: Proceedings of International Conference, ICERECT 2018.   Springer, 2019, pp. 117–127.
  33. V. Kumar and S. Minz, “Multi-view ensemble learning: a supervised feature set partitioning for high dimensional data classification,” in Proceedings of the Third International Symposium on Women in Computing and Informatics, 2015, pp. 31–37.
  34. S. Guggari, V. Kadappa, and V. Umadevi, “Non-sequential partitioning approaches to decision tree classifier,” Future Computing and Informatics Journal, vol. 3, no. 2, pp. 275–285, 2018.
  35. L. Zheng, F. Chao, N. Mac Parthaláin, D. Zhang, and Q. Shen, “Feature grouping and selection: A graph-based approach,” Information Sciences, vol. 546, pp. 1256–1272, 2021.
  36. K. Taheri, H. Moradi, and M. Tavassolipour, “Collaboration graph for feature set partitioning in data classification,” Expert Systems with Applications, vol. 213, p. 118988, 2023.
  37. V. Kumar and S. Minz, “Multi-view ensemble learning: an optimal feature set partitioning for high-dimensional data classification,” Knowledge and Information Systems, vol. 49, pp. 1–59, 2016.
  38. M. Saini, S. Verma, and A. Sharan, “Multi-view ensemble learning using rough set based feature ranking for opinion spam detection,” in Advances in Computer Communication and Computational Sciences: Proceedings of IC4S 2017, Volume 1.   Springer, 2019, pp. 3–12.
  39. L. Rokach, “Genetic algorithm-based feature set partitioning for classification problems,” Pattern Recognition, vol. 41, no. 5, pp. 1676–1700, 2008.
  40. V. Kumar, P. S. S. Aydav, and S. Minz, “Multi-view ensemble learning using multi-objective particle swarm optimization for high dimensional data classification,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 10, pp. 8523–8537, 2022.
  41. A. Husin, “Ant system-based feature set partitioning algorithm for classifier ensemble construction,” International Journal of Soft Computing, vol. 11, no. 3, pp. 176–184, 2016.
  42. J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu, “Feature selection: A data perspective,” ACM computing surveys (CSUR), vol. 50, no. 6, pp. 1–45, 2017.
  43. S. Yu and J. C. Príncipe, “Simple stopping criteria for information theoretic feature selection,” Entropy, vol. 21, no. 1, p. 99, 2019.
  44. N. Chen, J. Zhu, F. Sun, and E. P. Xing, “Large-margin predictive latent subspace learning for multiview data analysis,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 12, pp. 2365–2378, 2012.
  45. K. D. Feuz and D. J. Cook, “Collegial activity learning between heterogeneous sensors,” Knowledge and information systems, vol. 53, pp. 337–364, 2017.
  46. Z. Chen, X. Zhang, and X. Cheng, “Asm2tv: an adaptive semi-supervised multi-task multi-view learning framework for human activity recognition,” in Proceedings of the AAAI conference on artificial intelligence, vol. 36, no. 6, 2022, pp. 6342–6349.
  47. U. Brefeld, “Multi-view learning with dependent views,” in Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015, pp. 865–870.
  48. J. Wen, Z. Zhang, Y. Xu, B. Zhang, L. Fei, and H. Liu, “Unified embedding alignment with missing views inferring for incomplete multi-view clustering,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 5393–5400.
  49. “APS Failure at Scania Trucks,” UCI Machine Learning Repository, 2017, DOI: https://doi.org/10.24432/C51S51.
  50. “Activity recognition using wearable physiological measurements,” UCI Machine Learning Repository, 2019, DOI: https://doi.org/10.24432/C5RK6V.
  51. S. Fiorini, “gene expression cancer RNA-Seq,” UCI Machine Learning Repository, 2016, DOI: https://doi.org/10.24432/C5R88H.
  52. F. Freitas, F. Barbosa, and S. Peres, “Grammatical Facial Expressions,” UCI Machine Learning Repository, 2014, DOI: https://doi.org/10.24432/C59S3R.
  53. A. Vergara, “Gas Sensor Array Drift Dataset at Different Concentrations,” UCI Machine Learning Repository, 2013, DOI: https://doi.org/10.24432/C5MK6M.
  54. J. Reyes-Ortiz, D. Anguita, L. Oneto, and X. Parra, “Smartphone-Based Recognition of Human Activities and Postural Transitions,” UCI Machine Learning Repository, 2015, DOI: https://doi.org/10.24432/C54G7M.
  55. R. Cole and M. Fanty, “ISOLET,” UCI Machine Learning Repository, 1994, DOI: https://doi.org/10.24432/C51G69.
  56. C. Sakar, G. Serbes, A. Gunduz, H. Nizam, and B. Sakar, “Parkinson’s Disease Classification,” UCI Machine Learning Repository, 2018, DOI: https://doi.org/10.24432/C5MS4X.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com