StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics (2005.01575v9)
Abstract: In ML, ensemble methods such as bagging, boosting, and stacking are widely-established approaches that regularly achieve top-notch predictive performance. Stacking (also called "stacked generalization") is an ensemble method that combines heterogeneous base models, arranged in at least one layer, and then employs another metamodel to summarize the predictions of those models. Although it may be a highly-effective approach for increasing the predictive performance of ML, generating a stack of models from scratch can be a cumbersome trial-and-error process. This challenge stems from the enormous space of available solutions, with different sets of data instances and features that could be used for training, several algorithms to choose from, and instantiations of these algorithms using diverse parameters (i.e., models) that perform differently according to various metrics. In this work, we present a knowledge generation model, which supports ensemble learning with the use of visualization, and a visual analytics system for stacked generalization. Our system, StackGenVis, assists users in dynamically adapting performance metrics, managing data instances, selecting the most important features for a given data set, choosing a set of top-performant and diverse algorithms, and measuring the predictive performance. In consequence, our proposed tool helps users to decide between distinct models and to reduce the complexity of the resulting stack by removing overpromising and underperforming models. The applicability and effectiveness of StackGenVis are demonstrated with two use cases: a real-world healthcare data set and a collection of data related to sentiment/stance detection in texts. Finally, the tool has been evaluated through interviews with three ML experts.
- M. Brehmer and T. Munzner. A multi-level typology of abstract visualization tasks. IEEE Transactions on Visualization and Computer Graphics, 19(12):2376–2385, Dec. 2013. doi: 10 . 1109/TVCG . 2013 . 124
- L. Breiman. Random forests. Machine Learning, 45:5–32, Oct. 2001. doi: 10 . 1023/A:1010933404324
- A survey of surveys on the use of visualization for interpreting machine learning models. Information Visualization, 19(3):207–233, July 2020. doi: 10 . 1177/1473871620904671
- The state of the art in enhancing trust in machine learning models with the use of visualizations. Computer Graphics Forum, 39(3):713–756, June 2020. doi: 10 . 1111/cgf . 14034
- t-viSNE: Interactive assessment and interpretation of t-SNE projections. IEEE Transactions on Visualization and Computer Graphics, 26(8):2696–2714, Aug. 2020. doi: 10 . 1109/TVCG . 2020 . 2986996
- LDA ensembles for interactive exploration and categorization of behaviors. IEEE Transactions on Visualization and Computer Graphics, 2019. doi: 10 . 1109/TVCG . 2019 . 2904069
- T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pp. 785–794. ACM, 2016. doi: 10 . 1145/2939672 . 2939785
- D. Chicco and G. Jurman. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21:6, Jan. 2020. doi: 10 . 1186/s12864-019-6413-7
- STARD 2015 guidelines for reporting diagnostic accuracy studies: Explanation and elaboration. BMJ Open, 6:e012799, Nov. 2016. doi: 10 . 1136/bmjopen-2016-012799
- Guidance in the human-machine analytics process. Visual Informatics, 2(3):166–180, Sept. 2018. doi: 10 . 1016/j . visinf . 2018 . 09 . 003
- BEAMES: Interactive multi-model steering, selection, and inspection for regression tasks. IEEE Computer Graphics and Applications, 39(9), Sept. 2019. doi: 10 . 1109/MCG . 2019 . 2922592
- J. Davis and M. Goadrich. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pp. 233–240. ACM, 2006. doi: 10 . 1145/1143844 . 1143874
- A task-based taxonomy of cognitive biases for information visualization. IEEE Transactions on Visualization and Computer Graphics, 26(2):1413–1432, Feb. 2020. doi: 10 . 1109/TVCG . 2018 . 2872577
- D. Dua and C. Graff. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2017. Accessed April 23, 2020.
- An experimental comparison of performance measures for classification. Pattern Recognition Letters, 30(1):27–38, Jan. 2009. doi: 10 . 1016/j . patrec . 2008 . 08 . 010
- A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5):771–780, Sept. 1999.
- Collaborative visualization: Definition, challenges, and research agenda. Information Visualization, 10(4):310–326, Oct. 2011. doi: 10 . 1177/1473871611412817
- Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts. Empirical Software Engineering, 21(4):1533–1578, Aug. 2016. doi: 10 . 1007/s10664-015-9401-9
- Towards automated anomaly report assignment in large complex systems using stacked generalization. In Proceedings of the Fifth IEEE International Conference on Software Testing, Verification and Validation, pp. 437–446. IEEE, 2012. doi: 10 . 1109/ICST . 2012 . 124
- Kaggle Competition — Otto Group product classification challenge. https://kaggle.com/c/otto-group-product-classification-challenge, 2015. Accessed April 13, 2020.
- LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS ’17, pp. 3149–3157. Curran Associates Inc., 2017.
- J. B. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1):1–27, Mar. 1964. doi: 10 . 1007/BF02289565
- Active learning and visual analytics for stance classification with ALVA. ACM Transactions on Interactive Intelligent Systems, 7(3), Oct. 2017. doi: 10 . 1145/3132169
- Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Informatics in Medicine Unlocked, 16:100203, 2019. doi: 10 . 1016/j . imu . 2019 . 100203
- Visual diagnosis of tree boosting methods. IEEE Transactions on Visualization and Computer Graphics, 24(1):163–173, Jan. 2018. doi: 10 . 1109/TVCG . 2017 . 2744378
- Y. Liu and J. Heer. Somewhere over the rainbow: An empirical assessment of quantitative colormaps. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI ’18, pp. 598:1–598:12. ACM, 2018. doi: 10 . 1145/3173574 . 3174172
- AUC: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography, 17(2):145–151, Mar. 2008. doi: 10 . 1111/j . 1466-8238 . 2007 . 00358 . x
- R. Lorbieski and S. M. Nassar. Impact of an extra layer on the stacking algorithm for classification problems. Journal of Computer Science, 14(5):613–622, May 2018. doi: 10 . 3844/jcssp . 2018 . 613 . 622
- Explaining vulnerabilities to adversarial machine learning through visual analytics. IEEE Transactions on Visualization and Computer Graphics, 26(1):1075–1085, Jan. 2020. doi: 10 . 1109/TVCG . 2019 . 2934631
- Ensemble of machine learning algorithms using the stacked generalization approach to estimate the warfarin dose. PLOS ONE, 13(10):1–12, Oct. 2018. doi: 10 . 1371/journal . pone . 0205872
- UMAP: Uniform manifold approximation and projection for dimension reduction. ArXiv e-prints, 1802.03426, Feb. 2018.
- Being accurate is not enough: How accuracy metrics have hurt recommender systems. In CHI ’06 Extended Abstracts on Human Factors in Computing Systems, CHI EA ’06, pp. 1097–1101. ACM, 2006. doi: 10 . 1145/1125451 . 1125659
- Troika — An improved stacking schema for classification tasks. Information Sciences, 179(24):4097–4122, Dec. 2009. doi: 10 . 1016/j . ins . 2009 . 08 . 025
- ProtoSteer: Steering deep sequence model with prototypes. IEEE Transactions on Visualization and Computer Graphics, 26(1):238–248, Jan. 2020. doi: 10 . 1109/TVCG . 2019 . 2934267
- Formalizing visualization design knowledge as constraints: Actionable and extensible models in Draco. IEEE Transactions on Visualization and Computer Graphics, 25(1):438–448, Jan. 2019. doi: 10 . 1109/TVCG . 2018 . 2865240
- T. Mühlbacher and H. Piringer. A partition-based framework for building and validating regression models. IEEE Transactions on Visualization and Computer Graphics, 19(12):1962–1971, Dec. 2013. doi: 10 . 1109/TVCG . 2013 . 125
- S. Nagi and D. K. Bhattacharyya. Classification of microarray cancer data using ensemble approach. Network Modeling Analysis in Health Informatics and Bioinformatics, 2(3):159–173, 2013. doi: 10 . 1007/s13721-013-0034-x
- Stacked generalization: An introduction to super learning. European Journal of Epidemiology, 33(5):459–464, May 2018. doi: 10 . 1007/s10654-018-0390-z
- R. Nambiar Jyothi and G. Prakash. A deep learning-based stacked generalization method to design smart healthcare solution. In Emerging Research in Electronics, Computer Science and Technology, pp. 211–222. Springer Singapore, 2019.
- A framework for provenance analysis and visualization. Procedia Computer Science, 108:1592–1601, 2017. doi: 10 . 1016/j . procs . 2017 . 05 . 216
- L. Pereira and N. Nunes. A comparison of performance metrics for event classification in non-intrusive load monitoring. In Proceedings of the IEEE International Conference on Smart Grid Communications, SmartGridComm ’17, pp. 159–164. IEEE, 2017. doi: 10 . 1109/SmartGridComm . 2017 . 8340682
- D. M. W. Powers. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, 2(1):37–63, 2011.
- Characterizing provenance in visualization and data analysis: An organizational framework of provenance types and purposes. IEEE Transactions on Visualization and Computer Graphics, 22(1):31–40, Jan. 2016. doi: 10 . 1109/TVCG . 2015 . 2467551
- Knowledge generation model for visual analytics. IEEE Transactions on Visualization and Computer Graphics, 20(12):1604–1613, Dec. 2014. doi: 10 . 1109/TVCG . 2014 . 2346481
- O. Sagi and L. Rokach. Ensemble learning: A survey. WIREs Data Mining and Knowledge Discovery, 8(4):e1249, July–Aug. 2018. doi: 10 . 1002/widm . 1249
- T. Saito and M. Rehmsmeier. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10(3):e0118432, Mar. 2015. doi: 10 . 1371/journal . pone . 0118432
- Integrating data and model space in ensemble learning by visual analytics. IEEE Transactions on Big Data, 2018. doi: 10 . 1109/TBDATA . 2018 . 2877350
- Visual predictive analytics using iFuseML. In Proceedings of the EuroVis Workshop on Visual Analytics, EuroVA ’18. The Eurographics Association, 2018. doi: 10 . 2312/eurova . 20181106
- B. Shneiderman. Human-centered artificial intelligence: Reliable, safe & trustworthy. International Journal of Human–Computer Interaction, 36(6):495–504, 2020. doi: 10 . 1080/10447318 . 2020 . 1741118
- Combining information extraction systems using voting and stacked generalization. Journal of Machine Learning Research, 6:1751–1782, Nov. 2005.
- Detection of stance and sentiment modifiers in political blogs. In Speech and Computer, vol. 10458 of LNCS, pp. 302–311. Springer International Publishing, 2017. doi: 10 . 1007/978-3-319-66429-3_29
- M. Sokolova and G. Lapalme. A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4):427–437, July 2009. doi: 10 . 1016/j . ipm . 2009 . 03 . 002
- Progressive visual analytics: User-driven visual exploration of in-progress analytics. IEEE Transactions on Visualization and Computer Graphics, 20(12):1653–1662, Dec. 2014. doi: 10 . 1109/TVCG . 2014 . 2346574
- B. L. Sturm. Classification accuracy is not enough. Journal of Intelligent Information Systems, 41(3):371–406, Dec. 2013. doi: 10 . 1007/s10844-013-0250-y
- EnsembleMatrix: Interactive visualization to support machine learning with multiple classifiers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’09, pp. 1283–1292. ACM, 2009. doi: 10 . 1145/1518701 . 1518895
- A. Tharwat. Classification assessment methods. Applied Computing and Informatics, 2018. doi: 10 . 1016/j . aci . 2018 . 08 . 003
- Stacked generalization: When does it work? In Proceedings of the Fifteenth International Joint Conference on Artifical Intelligence — Volume 2, IJCAI ’97, pp. 866–871. Morgan Kaufmann Publishers Inc., 1997.
- Storytelling and visualization: An extended survey. Information, 9(3):65, Mar. 2018. doi: 10 . 3390/info9030065
- Combining MF networks: A comparison among statistical methods and stacked generalization. In Artificial Neural Networks in Pattern Recognition, pp. 210–220. Springer Berlin Heidelberg, 2006. doi: 10 . 1007/11829898_19
- R. Tugay and Ş. Gündüz Öğüdücü. Demand prediction using machine learning methods and stacked generalization. In Proceedings of the 6th International Conference on Data Science, Technology and Applications, DATA ’17, pp. 216–222. SciTePress, 2017. doi: 10 . 5220/0006431602160222
- L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579–2605, 2008.
- ATMSeer: Increasing transparency and controllability in automated machine learning. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, pp. 681:1–681:12. ACM, 2019. doi: 10 . 1145/3290605 . 3300911
- D. H. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259, 1992. doi: 10 . 1016/S0893-6080(05)80023-1
- Analytic provenance for sensemaking: A research agenda. IEEE Computer Graphics and Applications, 35(3):56–64, May–June 2015. doi: 10 . 1109/MCG . 2015 . 50
- EnsembleLens: Ensemble-based visual exploration of anomaly detection algorithms with multidimensional data. IEEE Transactions on Visualization and Computer Graphics, 25(1):109–119, Jan. 2019. doi: 10 . 1109/TVCG . 2018 . 2864825
- Manifold: A model-agnostic framework for interpretation and diagnosis of machine learning models. IEEE Transactions on Visualization and Computer Graphics, 25(1):364–373, Jan. 2019. doi: 10 . 1109/TVCG . 2018 . 2864499
- LoVis: Local pattern visualization for model refinement. Computer Graphics Forum, 33(3):331–340, June 2014. doi: 10 . 1111/cgf . 12389
- iForest: Interpreting random forests via visual analytics. IEEE Transactions on Visualization and Computer Graphics, 25(1):407–416, Jan. 2019. doi: 10 . 1109/TVCG . 2018 . 2864475