Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bayesian Neural Networks with Domain Knowledge Priors (2402.13410v1)

Published 20 Feb 2024 in cs.LG and stat.ML

Abstract: Bayesian neural networks (BNNs) have recently gained popularity due to their ability to quantify model uncertainty. However, specifying a prior for BNNs that captures relevant domain knowledge is often extremely challenging. In this work, we propose a framework for integrating general forms of domain knowledge (i.e., any knowledge that can be represented by a loss function) into a BNN prior through variational inference, while enabling computationally efficient posterior inference and sampling. Specifically, our approach results in a prior over neural network weights that assigns high probability mass to models that better align with our domain knowledge, leading to posterior samples that also exhibit this behavior. We show that BNNs using our proposed domain knowledge priors outperform those with standard priors (e.g., isotropic Gaussian, Gaussian process), successfully incorporating diverse types of prior information such as fairness, physics rules, and healthcare knowledge and achieving better predictive performance. We also present techniques for transferring the learned priors across different model architectures, demonstrating their broad utility across various settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Wasserstein Generative Adversarial Networks. In International Conference on Machine Learning, pages 214–223. PMLR, 2017.
  2. Learning Beyond Simulated Physics. In Modeling and Decision-Making in the Spatiotemporal Domain Workshop, 2018.
  3. Demystifying MMD GANs. In International Conference on Learning Representations, 2018.
  4. Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
  5. Stochastic Gradient Hamiltonian Monte Carlo. In International Conference on Machine Learning, 2014.
  6. A Simple Framework for Contrastive Learning of Visual Representations. In International Conference on Machine Learning, 2020.
  7. Laplace Redux: Effortless Bayesian Deep Learning. Advances in Neural Information Processing Systems, 34:20089–20103, 2021.
  8. End-to-End Differentiable Physics for Learning and Control. In Advances in Neural Information Processing Systems, 2018.
  9. Retiring Adult: New Datasets for Fair Machine Learning. Advances in Neural Information Processing Systems, 2021.
  10. Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing. In International Conference on Machine Learning, 2020.
  11. Fairness Through Awareness. In Innovations in Theoretical Computer Science Conference, 2012.
  12. Vincent Fortuin. Priors in Bayesian Deep Learning: A Review. International Statistical Review, 90(3):563–591, 2022.
  13. The Prior Can Often Only Be Understood in the Context of the Likelihood. Entropy, 19(10), 2017.
  14. Bayesian Workflow. arXiv preprint arXiv:2011.01808, 2020.
  15. Generative Adversarial Nets. volume 27, 2014.
  16. A Kernel Two-Sample Test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
  17. Peter Grünwald and Thijs van Ommen. Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It. Bayesian Analysis, 12(4):1069 – 1103, 2017.
  18. Noise Contrastive Priors for Functional Uncertainty. In Uncertainty in Artificial Intelligence, 2019.
  19. Equality of Opportunity in Supervised Learning. Advances in Neural Information Processing Systems, 2016.
  20. Olivier Henaff. Data-Efficient Image Recognition with Contrastive Predictive Coding. In International Conference on Machine Learning, 2020.
  21. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. In International Conference on Learning Representations, 2016.
  22. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization. In International Conference on Computer Vision, 2021.
  23. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks. In International Conference on Machine Learning, 2015.
  24. Posterior Regularized Bayesian Neural Network Incorporating Soft and Hard Knowledge Constraints. Knowledge-Based Systems, 259:110043, 2023.
  25. Improving Deep Learning Interpretability by Saliency Guided Training. Advances in Neural Information Processing Systems, 2021.
  26. MIMIC-IV, A Freely Accessible Electronic Health Record Dataset. Scientific Data, 10(1), 2023.
  27. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
  28. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Advances in Neural Information Processing Systems, 2017.
  29. How We Analyzed the COMPAS Recidivism Algorithm. ProPublica, May 2016.
  30. The MNIST Database of Handwritten Digits, 1998. URL {http://yann.lecun.com/exdb/mnist/}.
  31. MMD GAN: Towards Deeper Understanding of Moment Matching Network. Advances in Neural Information Processing Systems, 30, 2017.
  32. Generative Moment Matching Networks. In International Conference on Machine Learning, 2015.
  33. David J C MacKay. Probable Networks and Plausible Predictions — A Review of Practical Bayesian Methods for Supervised Neural Networks. Network: Computation in Neural Systems, 6(3):469–505, 1995.
  34. David John Cameron MacKay. Bayesian Methods for Adaptive Models. PhD thesis, California Institute of Technology, 1992.
  35. A Simple Baseline for Bayesian Uncertainty in Deep Learning. Advances in Neural Information Processing Systems, 2019.
  36. A Survey on Bias and Fairness in Machine Learning. ACM Computing Surveys (CSUR), 54(6):1–35, 2021.
  37. Predictive Complexity Priors. In International Conference on Artificial Intelligence and Statistics, 2021.
  38. Eric T. Nalisnick. On Priors for Bayesian Neural Networks. PhD thesis, University of California, Irvine, 2018.
  39. Radford M. Neal. Bayesian Learning for Neural Networks. PhD thesis, University of Toronto, 1996.
  40. Dissecting Racial Bias in An Algorithm Used to Manage the Health of Populations. Science, 366(6464):447–453, 2019.
  41. Learning with Explanation Constraints. In Advances in Neural Information Processing Systems, 2023.
  42. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. 2016.
  43. Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2005.
  44. Interpretations Are Useful: Penalizing Explanations to Align Neural Networks with Prior Knowledge. In International Conference on Machine Learning, 2020.
  45. Right for the Right Reasons: Training Differentiable Models By Constraining Their Explanations. In International Joint Conference on Artificial Intelligence, 2017.
  46. Losses over Labels: Weakly Supervised Learning via Direct Loss Construction. arXiv preprint arXiv:2212.06921, 2022.
  47. Controlling Neural Networks with Rule Representations. Advances in Neural Information Processing Systems, 2021.
  48. Incorporating Unlabelled Data into Bayesian Neural Networks. arXiv preprint arXiv:2304.01762, 2023.
  49. Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors. Advances in Neural Information Processing Systems, 2022.
  50. Learning Structured Weight Uncertainty in Bayesian Neural Networks. In International Conference on Artificial Intelligence and Statistics, 2017.
  51. Functional Variational Bayesian Neural Networks. In International Conference on Learning Representations, 2019.
  52. All You Need is a Good Functional Prior for Bayesian Deep Learning. The Journal of Machine Learning Research, 23(1):3210–3265, 2022.
  53. MIMIC-Extract: A Data Extraction, Preprocessing, and Representation Pipeline for MIMIC-III. In Conference on Health, Inference, and Learning, 2020.
  54. Bayesian Learning via Stochastic Gradient Langevin Dynamics. In International Conference on Machine Learning, 2011.
  55. Bayesian Deep Learning and a Probabilistic Perspective of Generalization. Advances in Neural Information Processing Systems, 2020.
  56. A Kernel Two-Sample Test for Functional Data. The Journal of Machine Learning Research, 23(1):3159–3209, 2022.
  57. Incorporating Interpretable Output Constraints in Bayesian Neural Networks. In Advances in Neural Information Processing Systems, 2020.
  58. Fairness Constraints: Mechanisms for Fair Classification. In International Conference on Artificial Intelligence and Statistics, 2017.
  59. Bayesian Inference with Posterior Regularization and Applications to Infinite Latent SVMs. The Journal of Machine Learning Research, 15(1):1799–1847, 2014.
Citations (6)

Summary

  • The paper integrates domain knowledge as a loss function into BNN priors, improving model calibration and uncertainty estimation.
  • It employs variational inference to align weight distributions with domain-specific constraints for more reliable predictions.
  • Empirical results on diverse tasks, including DecoyMNIST, demonstrate enhanced performance and adaptability across architectures.

Bayesian Neural Networks with Domain Knowledge Priors

This paper introduces a framework for enhancing Bayesian Neural Networks (BNNs) by integrating domain knowledge into their priors. The authors focus on the challenge of leveraging prior knowledge, which can be expressed as a loss function, to improve the performance and reliability of BNNs. Through variational inference, the proposed approach aims to embed a prior distribution over neural network weights that aligns closely with available domain knowledge, thereby facilitating more informative posterior samples.

The utilization of BNNs has gained traction due to their capacity to capture model uncertainty, which is crucial in decision-making processes in sensitive domains such as healthcare or criminal justice. However, selecting an appropriate prior that reflects domain-specific insights remains difficult, especially given the high-dimensional nature of neural network weights. Traditionally, uninformative priors like isotropic Gaussian distributions are used, yet these do not exploit potential domain-specific knowledge.

The framework proposed in this paper addresses this gap by formulating domain knowledge as a loss function, denoted as ϕ\phi. This loss function evaluates how well a model adheres to the given domain knowledge. For example, in physics-influenced models, ϕ\phi could represent the degree to which a model's predictions violate physical laws like energy conservation. The prior over network weights is learned such that it assigns higher probability mass to weight configurations resulting in models that minimize ϕ\phi.

Empirical evaluations demonstrate that BNNs incorporating these domain knowledge-informed priors outperform those with conventional priors across various datasets. The framework was tested on tasks incorporating diverse types of prior information, such as feature importance in image classification, fairness in decision-making, clinical rules in healthcare interventions, and physical laws in dynamical systems. For instance, on the DecoyMNIST task, the proposed method improved accuracy by effectively capturing domain knowledge that encouraged ignoring spurious background features.

Additionally, the paper explores mechanisms for transferring learned informative priors across different BNN architectures. Techniques based on maximum mean discrepancy (MMD) and moment matching are proposed, allowing learned priors to be efficiently adapted to new architectures, thereby increasing their practical utility without direct access to the original domain knowledge loss function.

The investigation into informative priors presents practical implications for deploying BNNs in real-world applications where domain knowledge can be crucial for informed decision-making. Theoretical implications mainly revolve around enhancing the representational power of Bayesian approaches in neural networks by systematically leveraging structured prior information.

In conclusion, the paper offers significant advancements in BNN methodology by embedding domain knowledge into prior distributions, facilitating improved model performance, and allowing for generalizability across various network architectures. As the field progresses, further exploration into automating the specification of domain knowledge loss functions and extending the framework to more complex model architectures and domains could prove valuable in the evolution of BNN applications.

X Twitter Logo Streamline Icon: https://streamlinehq.com