Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics (2005.01575v9)

Published 4 May 2020 in cs.LG, cs.HC, and stat.ML

Abstract: In ML, ensemble methods such as bagging, boosting, and stacking are widely-established approaches that regularly achieve top-notch predictive performance. Stacking (also called "stacked generalization") is an ensemble method that combines heterogeneous base models, arranged in at least one layer, and then employs another metamodel to summarize the predictions of those models. Although it may be a highly-effective approach for increasing the predictive performance of ML, generating a stack of models from scratch can be a cumbersome trial-and-error process. This challenge stems from the enormous space of available solutions, with different sets of data instances and features that could be used for training, several algorithms to choose from, and instantiations of these algorithms using diverse parameters (i.e., models) that perform differently according to various metrics. In this work, we present a knowledge generation model, which supports ensemble learning with the use of visualization, and a visual analytics system for stacked generalization. Our system, StackGenVis, assists users in dynamically adapting performance metrics, managing data instances, selecting the most important features for a given data set, choosing a set of top-performant and diverse algorithms, and measuring the predictive performance. In consequence, our proposed tool helps users to decide between distinct models and to reduce the complexity of the resulting stack by removing overpromising and underperforming models. The applicability and effectiveness of StackGenVis are demonstrated with two use cases: a real-world healthcare data set and a collection of data related to sentiment/stance detection in texts. Finally, the tool has been evaluated through interviews with three ML experts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. M. Brehmer and T. Munzner. A multi-level typology of abstract visualization tasks. IEEE Transactions on Visualization and Computer Graphics, 19(12):2376–2385, Dec. 2013. doi: 10 . 1109/TVCG . 2013 . 124
  2. L. Breiman. Random forests. Machine Learning, 45:5–32, Oct. 2001. doi: 10 . 1023/A:1010933404324
  3. A survey of surveys on the use of visualization for interpreting machine learning models. Information Visualization, 19(3):207–233, July 2020. doi: 10 . 1177/1473871620904671
  4. The state of the art in enhancing trust in machine learning models with the use of visualizations. Computer Graphics Forum, 39(3):713–756, June 2020. doi: 10 . 1111/cgf . 14034
  5. t-viSNE: Interactive assessment and interpretation of t-SNE projections. IEEE Transactions on Visualization and Computer Graphics, 26(8):2696–2714, Aug. 2020. doi: 10 . 1109/TVCG . 2020 . 2986996
  6. LDA ensembles for interactive exploration and categorization of behaviors. IEEE Transactions on Visualization and Computer Graphics, 2019. doi: 10 . 1109/TVCG . 2019 . 2904069
  7. T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pp. 785–794. ACM, 2016. doi: 10 . 1145/2939672 . 2939785
  8. D. Chicco and G. Jurman. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21:6, Jan. 2020. doi: 10 . 1186/s12864-019-6413-7
  9. STARD 2015 guidelines for reporting diagnostic accuracy studies: Explanation and elaboration. BMJ Open, 6:e012799, Nov. 2016. doi: 10 . 1136/bmjopen-2016-012799
  10. Guidance in the human-machine analytics process. Visual Informatics, 2(3):166–180, Sept. 2018. doi: 10 . 1016/j . visinf . 2018 . 09 . 003
  11. BEAMES: Interactive multi-model steering, selection, and inspection for regression tasks. IEEE Computer Graphics and Applications, 39(9), Sept. 2019. doi: 10 . 1109/MCG . 2019 . 2922592
  12. J. Davis and M. Goadrich. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pp. 233–240. ACM, 2006. doi: 10 . 1145/1143844 . 1143874
  13. A task-based taxonomy of cognitive biases for information visualization. IEEE Transactions on Visualization and Computer Graphics, 26(2):1413–1432, Feb. 2020. doi: 10 . 1109/TVCG . 2018 . 2872577
  14. D. Dua and C. Graff. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2017. Accessed April 23, 2020.
  15. An experimental comparison of performance measures for classification. Pattern Recognition Letters, 30(1):27–38, Jan. 2009. doi: 10 . 1016/j . patrec . 2008 . 08 . 010
  16. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5):771–780, Sept. 1999.
  17. Collaborative visualization: Definition, challenges, and research agenda. Information Visualization, 10(4):310–326, Oct. 2011. doi: 10 . 1177/1473871611412817
  18. Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts. Empirical Software Engineering, 21(4):1533–1578, Aug. 2016. doi: 10 . 1007/s10664-015-9401-9
  19. Towards automated anomaly report assignment in large complex systems using stacked generalization. In Proceedings of the Fifth IEEE International Conference on Software Testing, Verification and Validation, pp. 437–446. IEEE, 2012. doi: 10 . 1109/ICST . 2012 . 124
  20. Kaggle Competition — Otto Group product classification challenge. https://kaggle.com/c/otto-group-product-classification-challenge, 2015. Accessed April 13, 2020.
  21. LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS ’17, pp. 3149–3157. Curran Associates Inc., 2017.
  22. J. B. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1):1–27, Mar. 1964. doi: 10 . 1007/BF02289565
  23. Active learning and visual analytics for stance classification with ALVA. ACM Transactions on Interactive Intelligent Systems, 7(3), Oct. 2017. doi: 10 . 1145/3132169
  24. Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Informatics in Medicine Unlocked, 16:100203, 2019. doi: 10 . 1016/j . imu . 2019 . 100203
  25. Visual diagnosis of tree boosting methods. IEEE Transactions on Visualization and Computer Graphics, 24(1):163–173, Jan. 2018. doi: 10 . 1109/TVCG . 2017 . 2744378
  26. Y. Liu and J. Heer. Somewhere over the rainbow: An empirical assessment of quantitative colormaps. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI ’18, pp. 598:1–598:12. ACM, 2018. doi: 10 . 1145/3173574 . 3174172
  27. AUC: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography, 17(2):145–151, Mar. 2008. doi: 10 . 1111/j . 1466-8238 . 2007 . 00358 . x
  28. R. Lorbieski and S. M. Nassar. Impact of an extra layer on the stacking algorithm for classification problems. Journal of Computer Science, 14(5):613–622, May 2018. doi: 10 . 3844/jcssp . 2018 . 613 . 622
  29. Explaining vulnerabilities to adversarial machine learning through visual analytics. IEEE Transactions on Visualization and Computer Graphics, 26(1):1075–1085, Jan. 2020. doi: 10 . 1109/TVCG . 2019 . 2934631
  30. Ensemble of machine learning algorithms using the stacked generalization approach to estimate the warfarin dose. PLOS ONE, 13(10):1–12, Oct. 2018. doi: 10 . 1371/journal . pone . 0205872
  31. UMAP: Uniform manifold approximation and projection for dimension reduction. ArXiv e-prints, 1802.03426, Feb. 2018.
  32. Being accurate is not enough: How accuracy metrics have hurt recommender systems. In CHI ’06 Extended Abstracts on Human Factors in Computing Systems, CHI EA ’06, pp. 1097–1101. ACM, 2006. doi: 10 . 1145/1125451 . 1125659
  33. Troika — An improved stacking schema for classification tasks. Information Sciences, 179(24):4097–4122, Dec. 2009. doi: 10 . 1016/j . ins . 2009 . 08 . 025
  34. ProtoSteer: Steering deep sequence model with prototypes. IEEE Transactions on Visualization and Computer Graphics, 26(1):238–248, Jan. 2020. doi: 10 . 1109/TVCG . 2019 . 2934267
  35. Formalizing visualization design knowledge as constraints: Actionable and extensible models in Draco. IEEE Transactions on Visualization and Computer Graphics, 25(1):438–448, Jan. 2019. doi: 10 . 1109/TVCG . 2018 . 2865240
  36. T. Mühlbacher and H. Piringer. A partition-based framework for building and validating regression models. IEEE Transactions on Visualization and Computer Graphics, 19(12):1962–1971, Dec. 2013. doi: 10 . 1109/TVCG . 2013 . 125
  37. S. Nagi and D. K. Bhattacharyya. Classification of microarray cancer data using ensemble approach. Network Modeling Analysis in Health Informatics and Bioinformatics, 2(3):159–173, 2013. doi: 10 . 1007/s13721-013-0034-x
  38. Stacked generalization: An introduction to super learning. European Journal of Epidemiology, 33(5):459–464, May 2018. doi: 10 . 1007/s10654-018-0390-z
  39. R. Nambiar Jyothi and G. Prakash. A deep learning-based stacked generalization method to design smart healthcare solution. In Emerging Research in Electronics, Computer Science and Technology, pp. 211–222. Springer Singapore, 2019.
  40. A framework for provenance analysis and visualization. Procedia Computer Science, 108:1592–1601, 2017. doi: 10 . 1016/j . procs . 2017 . 05 . 216
  41. L. Pereira and N. Nunes. A comparison of performance metrics for event classification in non-intrusive load monitoring. In Proceedings of the IEEE International Conference on Smart Grid Communications, SmartGridComm ’17, pp. 159–164. IEEE, 2017. doi: 10 . 1109/SmartGridComm . 2017 . 8340682
  42. D. M. W. Powers. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, 2(1):37–63, 2011.
  43. Characterizing provenance in visualization and data analysis: An organizational framework of provenance types and purposes. IEEE Transactions on Visualization and Computer Graphics, 22(1):31–40, Jan. 2016. doi: 10 . 1109/TVCG . 2015 . 2467551
  44. Knowledge generation model for visual analytics. IEEE Transactions on Visualization and Computer Graphics, 20(12):1604–1613, Dec. 2014. doi: 10 . 1109/TVCG . 2014 . 2346481
  45. O. Sagi and L. Rokach. Ensemble learning: A survey. WIREs Data Mining and Knowledge Discovery, 8(4):e1249, July–Aug. 2018. doi: 10 . 1002/widm . 1249
  46. T. Saito and M. Rehmsmeier. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10(3):e0118432, Mar. 2015. doi: 10 . 1371/journal . pone . 0118432
  47. Integrating data and model space in ensemble learning by visual analytics. IEEE Transactions on Big Data, 2018. doi: 10 . 1109/TBDATA . 2018 . 2877350
  48. Visual predictive analytics using iFuseML. In Proceedings of the EuroVis Workshop on Visual Analytics, EuroVA ’18. The Eurographics Association, 2018. doi: 10 . 2312/eurova . 20181106
  49. B. Shneiderman. Human-centered artificial intelligence: Reliable, safe & trustworthy. International Journal of Human–Computer Interaction, 36(6):495–504, 2020. doi: 10 . 1080/10447318 . 2020 . 1741118
  50. Combining information extraction systems using voting and stacked generalization. Journal of Machine Learning Research, 6:1751–1782, Nov. 2005.
  51. Detection of stance and sentiment modifiers in political blogs. In Speech and Computer, vol. 10458 of LNCS, pp. 302–311. Springer International Publishing, 2017. doi: 10 . 1007/978-3-319-66429-3_29
  52. M. Sokolova and G. Lapalme. A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4):427–437, July 2009. doi: 10 . 1016/j . ipm . 2009 . 03 . 002
  53. Progressive visual analytics: User-driven visual exploration of in-progress analytics. IEEE Transactions on Visualization and Computer Graphics, 20(12):1653–1662, Dec. 2014. doi: 10 . 1109/TVCG . 2014 . 2346574
  54. B. L. Sturm. Classification accuracy is not enough. Journal of Intelligent Information Systems, 41(3):371–406, Dec. 2013. doi: 10 . 1007/s10844-013-0250-y
  55. EnsembleMatrix: Interactive visualization to support machine learning with multiple classifiers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’09, pp. 1283–1292. ACM, 2009. doi: 10 . 1145/1518701 . 1518895
  56. A. Tharwat. Classification assessment methods. Applied Computing and Informatics, 2018. doi: 10 . 1016/j . aci . 2018 . 08 . 003
  57. Stacked generalization: When does it work? In Proceedings of the Fifteenth International Joint Conference on Artifical Intelligence — Volume 2, IJCAI ’97, pp. 866–871. Morgan Kaufmann Publishers Inc., 1997.
  58. Storytelling and visualization: An extended survey. Information, 9(3):65, Mar. 2018. doi: 10 . 3390/info9030065
  59. Combining MF networks: A comparison among statistical methods and stacked generalization. In Artificial Neural Networks in Pattern Recognition, pp. 210–220. Springer Berlin Heidelberg, 2006. doi: 10 . 1007/11829898_19
  60. R. Tugay and Ş. Gündüz Öğüdücü. Demand prediction using machine learning methods and stacked generalization. In Proceedings of the 6th International Conference on Data Science, Technology and Applications, DATA ’17, pp. 216–222. SciTePress, 2017. doi: 10 . 5220/0006431602160222
  61. L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579–2605, 2008.
  62. ATMSeer: Increasing transparency and controllability in automated machine learning. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, pp. 681:1–681:12. ACM, 2019. doi: 10 . 1145/3290605 . 3300911
  63. D. H. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259, 1992. doi: 10 . 1016/S0893-6080(05)80023-1
  64. Analytic provenance for sensemaking: A research agenda. IEEE Computer Graphics and Applications, 35(3):56–64, May–June 2015. doi: 10 . 1109/MCG . 2015 . 50
  65. EnsembleLens: Ensemble-based visual exploration of anomaly detection algorithms with multidimensional data. IEEE Transactions on Visualization and Computer Graphics, 25(1):109–119, Jan. 2019. doi: 10 . 1109/TVCG . 2018 . 2864825
  66. Manifold: A model-agnostic framework for interpretation and diagnosis of machine learning models. IEEE Transactions on Visualization and Computer Graphics, 25(1):364–373, Jan. 2019. doi: 10 . 1109/TVCG . 2018 . 2864499
  67. LoVis: Local pattern visualization for model refinement. Computer Graphics Forum, 33(3):331–340, June 2014. doi: 10 . 1111/cgf . 12389
  68. iForest: Interpreting random forests via visual analytics. IEEE Transactions on Visualization and Computer Graphics, 25(1):407–416, Jan. 2019. doi: 10 . 1109/TVCG . 2018 . 2864475
Citations (62)

Summary

We haven't generated a summary for this paper yet.