Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Upper Bound of Bayesian Generalization Error in Partial Concept Bottleneck Model (CBM): Partial CBM outperforms naive CBM (2403.09206v1)

Published 14 Mar 2024 in stat.ML, cs.AI, cs.LG, math.ST, and stat.TH

Abstract: Concept Bottleneck Model (CBM) is a methods for explaining neural networks. In CBM, concepts which correspond to reasons of outputs are inserted in the last intermediate layer as observed values. It is expected that we can interpret the relationship between the output and concept similar to linear regression. However, this interpretation requires observing all concepts and decreases the generalization performance of neural networks. Partial CBM (PCBM), which uses partially observed concepts, has been devised to resolve these difficulties. Although some numerical experiments suggest that the generalization performance of PCBMs is almost as high as that of the original neural networks, the theoretical behavior of its generalization error has not been yet clarified since PCBM is singular statistical model. In this paper, we reveal the Bayesian generalization error in PCBM with a three-layered and linear architecture. The result indcates that the structure of partially observed concepts decreases the Bayesian generalization error compared with that of CBM (full-observed concepts).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Towards robust interpretability with self-explaining neural networks. Advances in neural information processing systems, 31, 2018.
  2. Miki Aoyagi. Stochastic complexity and generalization error of a restricted boltzmann machine in bayesian estimation. Journal of Machine Learning Research, 11(Apr):1243–1272, 2010a.
  3. Miki Aoyagi. A bayesian learning coefficient of generalization error and vandermonde matrix-type singularities. Communications in Statistics—Theory and Methods, 39(15):2667–2687, 2010b.
  4. Miki Aoyagi. Learning coefficient in bayesian estimation of restricted boltzmann machine. Journal of Algebraic Statistics, 4(1):30–57, 2013.
  5. Miki Aoyagi. Learning coefficient of vandermonde matrix-type singularities in model selection. Entropy, 21(6):561, 2019.
  6. Miki Aoyagi. Consideration on the learning efficiency of multiple-layered neural networks with linear units. Neural Networks, pp.  106132, 2024.
  7. Stochastic complexities of reduced rank regression in bayesian estimation. Neural Networks, 18(7):924–933, 2005.
  8. Ethical machine learning in healthcare. Annual Review of Biomedical Data Science, 4(1):123–144, 2021. doi: 10.1146/annurev-biodatasci-092820-114757.
  9. Natural language processing (almost) from scratch. Journal of machine learning research, 12(ARTICLE):2493–2537, 2011.
  10. A survey on deep learning and its applications. Computer Science Review, 40:100379, 2021. ISSN 1574-0137.
  11. Mathias Drton. Likelihood ratio tests and singularities. The Annals of Statistics, 37(2):979–1012, 2009.
  12. A bayesian information criterion for singular models. Journal of the Royal Statistical Society Series B, 79:323–380, 2017. with discussion.
  13. Marginal likelihood and model selection for gaussian latent tree and forest models. Bernoulli, 23(2):1202–1232, 2017.
  14. Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural networks, 13(3):317–327, 2000.
  15. Estimating the local learning coefficient at scale. arXiv preprint arXiv:2402.03698, 2024.
  16. The estimation of probabilities: An essay on modern bayesian methods. Synthese, 16(2):234–244, 1966.
  17. Deep Learning. MIT Press, 2016.
  18. JA Hartigan. A failure of likelihood asymptotics for normal mixtures. In Proceedings of the Berkeley Conference in Honor of J. Neyman and J. Kiefer, 1985, pp.  807–810, 1985.
  19. Naoki Hayashi. Variational approximation error in non-negative matrix factorization. Neural Networks, 126:65–75, 2020.
  20. Naoki Hayashi. The exact asymptotic form of bayesian generalization error in latent dirichlet allocation. Neural Networks, 137:127–137, 2021.
  21. Bayesian generalization error in linear neural networks with concept bottleneck structure and multitask formulation. arXiv preprint arXiv:2303.09154, submitted to Neurocomputing, 2023.
  22. Upper bound of bayesian generalization error in non-negative matrix factorization. Neurocomputing, 266C(29 November):21–28, 2017a.
  23. Tighter upper bound of real log canonical threshold of non-negative matrix factorization and its application to bayesian inference. In IEEE Symposium Series on Computational Intelligence (IEEE SSCI), pp.  718–725, 11 2017b.
  24. Asymptotic bayesian generalization error in latent dirichlet allocation and stochastic matrix factorization. SN Computer Science, 1(2):1–22, 2020.
  25. Heisuke Hironaka. Resolution of singularities of an algbraic variety over a field of characteristic zero. Annals of Mathematics, 79:109–326, 1964.
  26. Accurate and robust registration of nonrigid surface using hierarchical statistical shape model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  2977–2984, 2013.
  27. X-mir: Explainable medical image retrieval. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  440–450, 2022.
  28. Toru Imai. Estimating real log canonical thresholds. arXiv preprint arXiv:1906.01341, 2019.
  29. Multiview concept bottleneck models applied to diagnosing pediatric appendicitis. In 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH), pp.  1–15. ETH Zurich, Institute for Machine Learning, 2022.
  30. Concept bottleneck models. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  5338–5348. PMLR, 13–18 Jul 2020.
  31. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  32. Attribute and simile classifiers for face verification. In 2009 IEEE 12th international conference on computer vision, pp.  365–372. IEEE, 2009.
  33. Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE conference on computer vision and pattern recognition, pp.  951–958. IEEE, 2009.
  34. Quantifying degeneracy in singular models via the learning coefficient. arXiv preprint arXiv:2308.12108, 2023.
  35. Clinical outcome prediction under hypothetical interventions–a representation learning framework for counterfactual reasoning. arXiv preprint arXiv:2205.07234, 2022.
  36. Concept representation by learning explicit and implicit concept couplings. IEEE Intelligent Systems, 36(1):6–15, 2021. doi: 10.1109/MIS.2020.3021188.
  37. Stochastic gradient descent as approximate bayesian inference. arXiv preprint arXiv:1704.04289, 2017.
  38. Weighted blowup and its application to a mixture of multinomial distributions. IEICE Transactions, J86-A(3):278–287, 2003. in Japanese.
  39. Richard McElreath. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC Press, 2nd editon edition, 2020.
  40. Acquisition of chess knowledge in alphazero. Proceedings of the National Academy of Sciences, 119(47):e2206625119, 2022. doi: 10.1073/pnas.2206625119.
  41. Christoph Molnar. Interpretable machine learning. Lulu. com, 2020.
  42. Asymptotic behavior of exchange ratio in exchange monte carlo method. Neural Networks, 21(7):980–988, 2008.
  43. Bayesian free energy of deep relu neural network in overparametrized cases. arXiv preprint arXiv:2303.15739, 2023a.
  44. Free energy of bayesian convolutional neural network with skip connection. arXiv preprint arXiv:2307.01417, 2023b. accepted to ACML2023.
  45. Asymptotic behavior of free energy when optimal probability distribution is not unique. Neurocomputing, 500:528–536, 2022.
  46. Asymptotic marginal likelihood on linear dynamical systems. IEICE TRANSACTIONS on Information and Systems, 97(4):884–892, 2014.
  47. Static and dynamic concepts for self-supervised video representation learning. In European Conference on Computer Vision, pp.  145–164. Springer, 2022.
  48. Learning to predict with supporting evidence: Applications to clinical risk prediction. In Proceedings of the Conference on Health, Inference, and Learning, pp.  95––104, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383592. doi: 10.1145/3450439.3451869.
  49. Asymptotic model selection for naive bayesian networks. Journal of Machine Learning Research, 6(Jan):1–35, 2005.
  50. Bayesian generalization error of poisson mixture and simplex vandermonde matrix type singularity. arXiv preprint arXiv:1912.13289, 2019.
  51. A study on graphical model structure for representing statistical shape model of point distribution model. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2012: 15th International Conference, Nice, France, October 1-5, 2012, Proceedings, Part II 15, pp.  470–477. Springer, 2012.
  52. Concept bottleneck model with additional unsupervised concepts. IEEE Access, 10:41758–41765, 2022.
  53. Gideon Schwarz. Estimating the dimension of a model. The annals of statistics, 6(2):461–464, 1978.
  54. Umut Şimşekli. Fractional langevin monte carlo: Exploring lévy driven stochastic differential equations for markov chain monte carlo. In International Conference on Machine Learning, pp. 3200–3209. PMLR, 2017.
  55. Stochastic natural gradient descent draws posterior samples in function space. In NeurIPS Workshop (2018), 2018. URL https://arxiv.org/pdf/1806.09597.pdf.
  56. Real log canonical threshold of three layered neural network with swish activation function. IEICE Technical Report; IEICE Tech. Rep., 119(360):9–15, 2020.
  57. E2pose: Fully convolutional networks for end-to-end multi-person pose estimation. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.  532–537. IEEE, 2022.
  58. Sumio Watanabe. Algebraic analysis for non-regular learning machines. Advances in Neural Information Processing Systems, 12:356–362, 2000. Denver, USA.
  59. Sumio Watanabe. Algebraic geometrical methods for hierarchical learning machines. Neural Networks, 13(4):1049–1060, 2001.
  60. Sumio Watanabe. Almost all learning machines are singular. In 2007 IEEE Symposium on Foundations of Computational Intelligence, pp.  383–388. IEEE, 2007.
  61. Sumio Watanabe. Algebraix Geometry and Statistical Learning Theory. Cambridge University Press, 2009.
  62. Sumio Watanabe. Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11(Dec):3571–3594, 2010.
  63. Sumio Watanabe. Mathematical theory of Bayesian statistics. CRC Press, 2018.
  64. Sumio Watanabe. Mathematical theory of bayesian statistics for unknown information source. Philosophical Transactions of the Royal Society A, pp. 1–26, 2023. to apperar.
  65. Asymptotic behavior of bayesian generalization error in multinomial mixtures. arXiv preprint arXiv:2203.06884, 2022.
  66. Deep learning is singular, and that’s good. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  67. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pp.  681–688. Citeseer, 2011.
  68. Explainable object-induced action decision for autonomous vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9523–9532, 2020.
  69. Comparing two bayes methods based on the free energy functions in bernoulli mixtures. Neural Networks, 44:36–43, 2013.
  70. Singularities in mixture models and upper bounds of stochastic complexity. Neural Networks, 16(7):1029–1038, 2003a.
  71. Stochastic complexity of bayesian networks. In Uncertainty in Artificial Intelligence (UAI’03), pp. 592–599, 2003b.
  72. Newton diagram and stochastic complexity in mixture of binomial distributions. In International Conference on Algorithmic Learning Theory, pp.  350–364. Springer, 2004.
  73. Algebraic geometry and stochastic complexity of hidden markov models. Neurocomputing, 69:62–84, 2005a. issue 1-3.
  74. Singularities in complete bipartite graph-type boltzmann machines and upper bounds of stochastic complexities. IEEE Transactions on Neural Networks, 16:312–324, 2005b. issue 2.
  75. How transferable are features in deep neural networks? In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 27, pp.  1–9. Curran Associates, Inc., 2014.
  76. Decoupling concept bottleneck model. openreview, 2022.
  77. Piotr Zwiernik. An asymptotic behaviour of the marginal likelihood for general markov models. Journal of Machine Learning Research, 12(Nov):3283–3310, 2011.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com