Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric Learning (2311.10246v2)
Abstract: Nonparametric learning is a fundamental concept in machine learning that aims to capture complex patterns and relationships in data without making strong assumptions about the underlying data distribution. Owing to simplicity and familiarity, one of the most well-known algorithms under this paradigm is the $k$-nearest neighbors ($k$-NN) algorithm. Driven by the usage of machine learning in safety-critical applications, in this work, we shed new light on the traditional nearest neighbors algorithm from the perspective of information theory and propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model. We can determine data point weights as well as feature contributions by calculating the conditional entropy for adding a feature without the need for explicit model training. This allows us to compute feature contributions by providing detailed data point influence weights with perfect attribution and can be used to query counterfactuals. Instead of using a traditional distance measure which needs to be scaled and contextualized, we use a novel formulation of $\textit{surprisal}$ (amount of information required to explain the difference between the observed and expected result). Finally, our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection, while also attaining competitive results for regression across a statistically significant number of datasets.
- On the surprising behavior of distance metrics in high dimensional space. In Database Theory—ICDT 2001: 8th International Conference London, UK, January 4–6, 2001 Proceedings 8, pp. 420–434. Springer, 2001.
- Instance-based learning algorithms. Mach. Learn., 6:37–66, 1991. doi: 10.1023/A:1022689900470. URL https://doi.org/10.1023/A:1022689900470.
- Alpaydın, E. Voting over multiple condensed nearest neighbors. Artificial Intelligence Review, 11, 09 1999. doi: 10.1023/A:1006563312922.
- Deep k-nn for noisy labels, 2020.
- Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 93–104, 2000.
- Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21–27, 1967. doi: 10.1109/TIT.1967.1053964.
- Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties. USAF School of Aviation Medicine, 1951.
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics). 02 2009. ISBN 0387848576.
- Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9):1263–1284, 2009. doi: 10.1109/TKDE.2008.239.
- Discovering cluster-based local outliers. Pattern recognition letters, 24(9-10):1641–1650, 2003.
- Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98, pp. 604–613, New York, NY, USA, 1998. Association for Computing Machinery. ISBN 0897919629. doi: 10.1145/276698.276876. URL https://doi.org/10.1145/276698.276876.
- Towards robust k-nearest-neighbor machine translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 5468–5477, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.367.
- “low-resource” text classification: A parameter-free classification method with compressors. In Findings of the Association for Computational Linguistics: ACL 2023, pp. 6810–6828, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.426. URL https://aclanthology.org/2023.findings-acl.426.
- Nearest neighbor machine translation. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=7wCBOfJ8hJM.
- Improving one-class svm for anomaly detection. In Proceedings of the 2003 international conference on machine learning and cybernetics (IEEE Cat. No. 03EX693), volume 5, pp. 3077–3081. IEEE, 2003.
- Ecod: Unsupervised outlier detection using empirical cumulative distribution functions. IEEE Transactions on Knowledge and Data Engineering, 2022.
- Isolation forest. In 2008 eighth ieee international conference on data mining, pp. 413–422. IEEE, 2008.
- Fast nearest neighbor machine translation. In Findings of the Association for Computational Linguistics: ACL 2022, pp. 555–565, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-acl.47. URL https://aclanthology.org/2022.findings-acl.47.
- Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning, 2018.
- Distance and similarity measures effect on the performance of k-nearest neighbor classifier - A review. CoRR, abs/1708.04321, 2017. URL http://arxiv.org/abs/1708.04321.
- Cumulative residual entropy: a new measure of information. IEEE Transactions on Information Theory, 50(6):1220–1228, 2004. doi: 10.1109/TIT.2004.828057.
- Rayana, S. Odds library, 2016. URL https://odds.cs.stonybrook.edu.
- Pmlb v1.0: an open source dataset collection for benchmarking machine learning methods. arXiv preprint arXiv:2012.00058v2, 2021.
- Deep one-class classification. In International conference on machine learning, pp. 4393–4402. PMLR, 2018.
- Improving the performance of high-dimensional knn retrieval through localized dataspace segmentation and hybrid indexing. In Catania, B., Guerrini, G., and Pokorný, J. (eds.), Advances in Databases and Information Systems, pp. 344–357, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
- Quality and efficiency in high dimensional nearest neighbor search. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09, pp. 563–576, New York, NY, USA, 2009. Association for Computing Machinery. ISBN 9781605585512. doi: 10.1145/1559845.1559905. URL https://doi.org/10.1145/1559845.1559905.
- Divergence estimation for multidimensional densities via -nearest-neighbor distances. Information Theory, IEEE Transactions on, 55:2392 – 2405, 06 2009. doi: 10.1109/TIT.2009.2016060.
- Łukaszyk, S. Probability metric, examples of approximation applications in experimental mechanics. PhD thesis, 01 2003.
- Łukaszyk, S. A new concept of probability metric and its applications in approximation of scattered data sets. Computational Mechanics, 33:299–304, 03 2004. doi: 10.1007/s00466-003-0532-2.