What is different between these datasets? (2403.05652v2)
Abstract: The performance of machine learning models relies heavily on the quality of input data, yet real-world applications often face significant data-related challenges. A common issue arises when curating training data or deploying models: two datasets from the same domain may exhibit differing distributions. While many techniques exist for detecting such distribution shifts, there is a lack of comprehensive methods to explain these differences in a human-understandable way beyond opaque quantitative metrics. To bridge this gap, we propose a versatile toolbox of interpretable methods for comparing datasets. Using a variety of case studies, we demonstrate the effectiveness of our approach across diverse data modalities -- including tabular data, text data, images, time series signals -- in both low and high-dimensional settings. These methods complement existing techniques by providing actionable and interpretable insights to better understand and address distribution shifts.
- OpenXAI: Towards a Transparent Evaluation of Model Explanations. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 15784–15799. https://proceedings.neurips.cc/paper_files/paper/2022/file/65398a0eba88c9b4a1c38ae405b125ef-Paper-Datasets_and_Benchmarks.pdf
- Getting a {CLUE}: A Method for Explaining Uncertainty Estimates. In International Conference on Learning Representations. https://openreview.net/forum?id=XSLF1XFq5h
- Interpretable Machine Learning System to EEG Patterns on the Ictal-Interictal-Injury Continuum. arXiv:2211.05207 [cs.CV]
- Barry Becker and Ronny Kohavi. 1996. Adult. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5XW20.
- Breathing Rate Estimation From the Electrocardiogram and Photoplethysmogram: A Review. IEEE Rev Biomed Eng 11 (2018), 2–20.
- This Looks Like That: Deep Learning for Interpretable Image Recognition. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/adf7ee2dcf142b0e11888e72b43fcb75-Paper.pdf
- Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering 5, 6 (01 Jun 2021), 493–497. https://doi.org/10.1038/s41551-021-00751-8
- Empirical evaluation of cross-site reproducibility in radiomic features for characterizing prostate MRI. In Medical Imaging 2018: Computer-Aided Diagnosis (Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 10575), Nicholas Petrick and Kensaku Mori (Eds.). Article 105750B, 105750B pages. https://doi.org/10.1117/12.2293992
- Marco Cuturi. 2013. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Advances in Neural Information Processing Systems, C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger (Eds.), Vol. 26. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2013/file/af21d0c97db2e27e13572cbf59eb343d-Paper.pdf
- Counterfactual explanations for misclassified images: How human and machine explanations differ. Artificial Intelligence 324 (2023), 103995. https://doi.org/10.1016/j.artint.2023.103995
- Li Deng. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141–142.
- The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance. arXiv:2309.13775 [cs.LG]
- What’s In My Big Data? arXiv:2310.20707 [cs.CL]
- Gölge Eren and The Coqui TTS Team. 2021. Coqui TTS. https://doi.org/10.5281/zenodo.6334862
- FICO. 2018. Explainable Machine Learning Challenge.
- Out-of-Distribution Robustness via Targeted Augmentations. In NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications. https://openreview.net/forum?id=Bcg0It4i1g
- Mauro Giuffrè and Dennis L. Shung. 2023. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. npj Digital Medicine 6, 1 (09 Oct 2023), 186. https://doi.org/10.1038/s41746-023-00927-3
- How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. arXiv preprint arxiv:2301.07597 (2023).
- Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Scientific Reports 12, 1 (17 Feb 2022), 2726. https://doi.org/10.1038/s41598-022-06484-1
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8340–8349.
- Towards a comprehensive evaluation of dimension reduction methods for transcriptomic data visualization. Communications biology 5, 1 (2022), 719.
- Sequential covariate shift detection using classifier two-sample tests. In International Conference on Machine Learning. PMLR, 9845–9880.
- Pang Wei Koh and Percy Liang. 2017. Understanding Black-box Predictions via Influence Functions. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 1885–1894. https://proceedings.mlr.press/v70/koh17a.html
- Sean Kulinski and David I Inouye. 2023a. Towards explaining distribution shifts. In International Conference on Machine Learning. PMLR, 17931–17952.
- Sean Kulinski and David I. Inouye. 2023b. Towards Explaining Distribution Shifts. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 17931–17952. https://proceedings.mlr.press/v202/kulinski23a.html
- A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets. In Findings of the Association for Computational Linguistics: ACL 2023, Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 431–469. https://doi.org/10.18653/v1/2023.findings-acl.29
- Generalized and scalable optimal sparse decision trees. In Proceedings of the 37th International Conference on Machine Learning (ICML’20). JMLR.org, Article 571, 11 pages.
- S. R. Livingstone and F. A. Russo. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One 13, 5 (2018), e0196391.
- Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.
- NeuroKit2: A Python toolbox for neurophysiological signal processing. Behavior Research Methods 53, 4 (feb 2021), 1689–1696. https://doi.org/10.3758/s13428-020-01516-y
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Gabriel Peyré and Marco Cuturi. 2019. Computational Optimal Transport: With Applications to Data Science. Foundations and Trends® in Machine Learning 11, 5-6 (2019), 355–607. https://doi.org/10.1561/2200000073
- ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016. 1135–1144.
- Anchors: High-Precision Model-Agnostic Explanations. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (April 2018). https://doi.org/10.1609/aaai.v32i1.11491
- Cynthia Rudin. 2019. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nature Machine Intelligence 1 (May 2019), 206–215.
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In 2017 IEEE International Conference on Computer Vision (ICCV). 618–626. https://doi.org/10.1109/ICCV.2017.74
- On the Existence of Simpler Machine Learning Models. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). ACM. https://doi.org/10.1145/3531146.3533232
- Lloyd S Shapley et al. 1953. A value for n-person games. (1953).
- Towards out-of-distribution generalization: A survey. arXiv preprint arXiv:2108.13624 (2021).
- Prototype-Based Explanations for Graph Neural Networks (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence 36, 11 (June 2022), 13047–13048. https://doi.org/10.1609/aaai.v36i11.21660
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. CoRR abs/1312.6034 (2013). http://dblp.uni-trier.de/db/journals/corr/corr1312.html#SimonyanVZ13
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.
- SmoothGrad: removing noise by adding noise. CoRR abs/1706.03825 (2017). arXiv:1706.03825 http://arxiv.org/abs/1706.03825
- Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research 8, 5 (2007).
- React: Out-of-distribution detection with rectified activations. Advances in Neural Information Processing Systems 34 (2021), 144–157.
- Test-time training with self-supervision for generalization under distribution shifts. In International conference on machine learning. PMLR, 9229–9248.
- Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML’17). JMLR.org, 3319–3328.
- Jessica Torres-Soto and Euan A Ashley. 2020. Multi-task deep learning for cardiac rhythm detection in wearable devices. NPJ Digit. Med. 3, 1 (Sept. 2020), 116.
- Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
- Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. npj Digital Medicine 3, 1 (09 Nov 2020), 147. https://doi.org/10.1038/s41746-020-00353-9
- Towards Robust and Reliable Algorithmic Recourse. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 16926–16937. https://proceedings.neurips.cc/paper_files/paper/2021/file/8ccfb1140664a5fa63177fb6e07352f0-Paper.pdf
- Actionable Recourse in Linear Classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency (Atlanta, GA, USA) (FAT* ’19). Association for Computing Machinery, New York, NY, USA, 10–19. https://doi.org/10.1145/3287560.3287566
- Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMap, and PaCMAP for Data Visualization. Journal of Machine Learning Research 22, 201 (2021), 1–73. http://jmlr.org/papers/v22/20-1061.html
- Exploring the Whole Rashomon Set of Sparse Decision Trees. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 14071–14084. https://proceedings.neurips.cc/paper_files/paper/2022/file/5afaa8b4dd18eb1eed055d2d821b58ae-Paper-Conference.pdf
- Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334 (2021).
- SUPERB: Speech processing Universal PERformance Benchmark. arXiv preprint arXiv:2105.01051 (2021).
- MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation. arXiv:2205.15540 [cs.AI]
- Change is Hard: A Closer Look at Subpopulation Shift. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 39584–39622. https://proceedings.mlr.press/v202/yang23s.html
- A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models. ArXiv abs/2303.10420 (2023). https://api.semanticscholar.org/CorpusID:257632113
- ”Why Did the Model Fail?”: Attributing Model Performance Changes to Distribution Shifts. In Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA) (ICML’23). JMLR.org, Article 1744, 29 pages.
- Domain adaptation under target and conditional shift. In International conference on machine learning. PMLR, 819–827.
- GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language. arXiv:2206.15007 [cs.CL]