Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data Selection: A General Principle for Building Small Interpretable Models (2210.03921v3)

Published 8 Oct 2022 in cs.LG

Abstract: We present convincing empirical evidence for an effective and general strategy for building accurate small models. Such models are attractive for interpretability and also find use in resource-constrained environments. The strategy is to learn the training distribution and sample accordingly from the provided training data. The distribution learning algorithm is not a contribution of this work; our contribution is a rigorous demonstration of the broad utility of this strategy in various practical settings. We apply it to the tasks of (1) building cluster explanation trees, (2) prototype-based classification, and (3) classification using Random Forests, and show that it improves the accuracy of decades-old weak traditional baselines to be competitive with specialized modern techniques. This strategy is also versatile wrt the notion of model size. In the first two tasks, model size is considered to be number of leaves in the tree and the number of prototypes respectively. In the final task involving Random Forests, the strategy is shown to be effective even when model size comprises of more than one factor: number of trees and their maximum depth. Positive results using multiple datasets are presented that are shown to be statistically significant.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
  2. F. Angiulli. Fast condensed nearest neighbor rule. In Proceedings of the 22nd International Conference on Machine Learning, ICML ’05, page 25–32, New York, NY, USA, 2005. Association for Computing Machinery. ISBN 1595931805. 10.1145/1102351.1102355. URL https://doi.org/10.1145/1102351.1102355.
  3. Should we really use post-hoc tests based on mean-ranks? Journal of Machine Learning Research, 17(5):1–10, 2016. URL http://jmlr.org/papers/v17/benavoli16a.html.
  4. Efficient and modular implicit differentiation. arXiv preprint arXiv:2105.15183, 2021.
  5. L. Breiman. Random forests. Machine Learning, 45(1):5–32, Oct 2001. ISSN 1573-0565. 10.1023/A:1010933404324. URL https://doi.org/10.1023/A:1010933404324.
  6. L. Breiman et al. Classification and Regression Trees. Chapman & Hall, New York, 1984. ISBN 0-412-04841-8.
  7. D. Broomhead and D. Lowe. Multivariable functional interpolation and adaptive networks. Complex Systems, 2:321–355, 1988.
  8. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm, datasets at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.
  9. J. Demšar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(1):1–30, 2006. URL http://jmlr.org/papers/v7/demsar06a.html.
  10. EdgeML: Machine Learning for resource-constrained edge devices, 2021. URL https://github.com/Microsoft/EdgeML.
  11. D. Dua and C. Graff. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
  12. J. Feldman. Minimization of boolean complexity in human concept learning. Nature, 407:630–3, 11 2000. 10.1038/35036586.
  13. Bilevel programming for hyperparameter optimization and meta-learning. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1568–1577. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/franceschi18a.html.
  14. M. Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200):675–701, 1937. ISSN 01621459. URL http://www.jstor.org/stable/2279372.
  15. Exkmc: Expanding explainable k𝑘kitalic_k-means clustering. arXiv preprint arXiv:2006.02399, 2020.
  16. A. Ghose. compactem, Nov. 2020. URL https://pypi.org/project/compactem/. Software available at https://compactem.readthedocs.io/en/latest/index.html.
  17. A. Ghose and B. Ravindran. Learning interpretable models using an oracle. CoRR, abs/1906.06852, 2019. URL http://arxiv.org/abs/1906.06852.
  18. A. Ghose and B. Ravindran. Interpretability with accurate small models. Frontiers in Artificial Intelligence, 3, 2020. ISSN 2624-8212. 10.3389/frai.2020.00003. URL https://www.frontiersin.org/articles/10.3389/frai.2020.00003.
  19. ProtoNN: Compressed and accurate kNN for resource-scarce devices. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1331–1340. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/gupta17a.html.
  20. A stopping criterion for bayesian optimization by the gap of expected minimum simple regrets. In F. Ruiz, J. Dy, and J.-W. van de Meent, editors, Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pages 6463–6497. PMLR, 25–27 Apr 2023. URL https://proceedings.mlr.press/v206/ishibashi23a.html.
  21. N. Japkowicz and M. Shah. Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, 2011. 10.1017/CBO9780511921803.
  22. compactem, Apr. 2020a. URL https://cran.r-project.org/web/packages/OTE/. Software available at https://cran.r-project.org/web/packages/OTE/.
  23. Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 14(1):97–116, Mar 2020b. ISSN 1862-5355. 10.1007/s11634-019-00364-9. URL https://doi.org/10.1007/s11634-019-00364-9.
  24. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Y. Bengio and Y. LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
  25. Stochastic neighbor compression. In E. P. Xing and T. Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 622–630, Bejing, China, 22–24 Jun 2014. PMLR. URL https://proceedings.mlr.press/v32/kusner14.html.
  26. Shallow decision trees for explainable k-means clustering. CoRR, abs/2112.14718, 2021. URL https://arxiv.org/abs/2112.14718.
  27. Human evaluation of models built for interpretability. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 7(1):59–67, Oct. 2019. URL https://ojs.aaai.org/index.php/HCOMP/article/view/5280.
  28. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’18/IAAI’18/EAAI’18. AAAI Press, 2018. ISBN 978-1-57735-800-8.
  29. Overfitting in bayesian optimization: An empirical study and early-stopping solution. In ICLR 2021 Workshop on Neural Architecture Search, 2021. URL https://www.amazon.science/publications/overfitting-in-bayesian-optimization-an-empirical-study-and-early-stopping-solution.
  30. K. Makarychev and L. Shan. Explainable k-means: Don’t be greedy, plant bigger trees! In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022, page 1629–1642, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392648. 10.1145/3519935.3520056. URL https://doi.org/10.1145/3519935.3520056.
  31. Explainable k-means and k-medians clustering. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 7055–7065. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/moshkovitz20a.html.
  32. Machine learning at the network edge: A survey. ACM Comput. Surv., 54(8), oct 2021. ISSN 0360-0300. 10.1145/3469029. URL https://doi.org/10.1145/3469029.
  33. This looks like that, because … explaining prototypes for interpretable image recognition. In M. Kamp, I. Koprinska, A. Bibal, T. Bouadi, B. Frénay, L. Galárraga, J. Oramas, L. Adilova, Y. Krishnamurthy, B. Kang, C. Largeron, J. Lijffijt, T. Viard, P. Welke, M. Ruocco, E. Aune, C. Gallicchio, G. Schiele, F. Pernkopf, M. Blott, H. Fröning, G. Schindler, R. Guidotti, A. Monreale, S. Rinzivillo, P. Biecek, E. Ntoutsi, M. Pechenizkiy, B. Rosenhahn, C. Buckley, D. Cialfi, P. Lanillos, M. Ramstead, T. Verbelen, P. M. Ferreira, G. Andresini, D. Malerba, I. Medeiros, P. Fournier-Viger, M. S. Nawaz, S. Ventura, M. Sun, M. Zhou, V. Bitetta, I. Bordino, A. Ferretti, F. Gullo, G. Ponti, L. Severini, R. Ribeiro, J. Gama, R. Gavaldà, L. Cooper, N. Ghazaleh, J. Richiardi, D. Roqueiro, D. Saldana Miranda, K. Sechidis, and G. Graça, editors, Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pages 441–456, Cham, 2021. Springer International Publishing. ISBN 978-3-030-93736-2.
  34. F. Pedregosa. Hyperparameter optimization with approximate gradient. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, page 737–746. JMLR.org, 2016.
  35. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  36. Manipulating and measuring model interpretability. In CHI 2021, May 2021. URL https://www.microsoft.com/en-us/research/publication/manipulating-and-measuring-model-interpretability/.
  37. “why should i trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 1135–1144, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4232-2. 10.1145/2939672.2939778. URL http://doi.acm.org/10.1145/2939672.2939778.
  38. R. Sanchez-Iborra and A. F. Skarmeta. Tinyml-enabled frugal smart objects: Challenges and opportunities. IEEE Circuits and Systems Magazine, 20(3):4–18, 2020. 10.1109/MCAS.2020.3005467.
  39. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, Jan. 2016. ISSN 0018-9219. 10.1109/JPROC.2015.2494218. Publisher Copyright: © 1963-2012 IEEE.
  40. F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80–83, 1945. ISSN 00994987. URL http://www.jstor.org/stable/3001968.
  41. H. Zhang and M. Wang. Search for the smallest random forest. Statistics and its interface, 2:381, 01 2009. 10.4310/SII.2009.v2.n3.a11.
  42. A distributed storage and computation k-nearest neighbor algorithm based cloud-edge computing for cyber-physical-social systems. IEEE Access, 8:50118–50130, 2020. 10.1109/ACCESS.2020.2974764.

Summary

We haven't generated a summary for this paper yet.