Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Catalytic Priors: Using Synthetic Data to Specify Prior Distributions in Bayesian Analysis (2208.14123v2)

Published 30 Aug 2022 in stat.ME

Abstract: Catalytic prior distributions provide general, easy-to-use, and interpretable specifications of prior distributions for Bayesian analysis. They are particularly beneficial when the observed data are inadequate to stably estimate a complex target model. A catalytic prior distribution is constructed by augmenting the observed data with synthetic data that are sampled from the predictive distribution of a simpler model estimated from the observed data. We illustrate the usefulness of the catalytic prior approach using an example from labor economics. In the example, the resulting Bayesian inference reflects many important aspects of the observed data, and the estimation accuracy and predictive performance of the inference based on the catalytic prior are superior to, or comparable to, that of other commonly used prior distributions. We further explore the connection between the catalytic prior approach and a few popular regularization methods. We expect the catalytic prior approach to be useful in many applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Birnbaum, A. (1962). On the foundations of statistical inference. Journal of the American Statistical Association, 57(298):269–306.
  2. Box, G. (1979). Robustness in the strategy of scientific model building. In Launer, R. L. and Wilkinson, G. N., editors, Robustness in Statistics, pages 201–236. Academic Press.
  3. An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological), 26(2):211–243.
  4. Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression. Journal of the American Statistical Association, 86(413):68–78.
  5. Dempster, A. P. (1968). A generalization of Bayesian inference. Journal of the Royal Statistical Society: Series B (Methodological), 30(2):205–232.
  6. A statistical view of some chemometrics regression tools. Technometrics, 35(2):109–135.
  7. Are high-cost services more effective than low-cost services? In Evaluating welfare and training programs, pages 143–98. Harvard University Press, Cambridge.
  8. The Saturation Work Initiative Model in San Diego: A Five-Year Follow-up Study. Manpower Demonstration Research Corporation, New York.
  9. Evaluating program evaluations: New evidence on commonly used nonexperimental methods. The American Economic Review, 85(4):923–937.
  10. A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4):1360 – 1383.
  11. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67.
  12. Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396):945–960.
  13. Predicting the efficacy of future training programs using past experiences at other locations. Journal of Econometrics, 125(1-2):241–270.
  14. Catalytic prior distributions with application to generalized linear models. Proceedings of the National Academy of Sciences, 117(22):12004–12010.
  15. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.
  16. Jackson, D. D. (1979). The use of a priori data to resolve non-uniqueness in linear inversion. Geophysical Journal International, 57(1):137–157.
  17. A Bayesian interpretation of pretesting. Journal of the Royal Statistical Society: Series B (Methodological), 38(1):85–94.
  18. Marquardt, D. W. (1970). Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics, 12(3):591–612.
  19. Pratt, J. W. (1965). Bayesian interpretation of standard inference statements. Journal of the Royal Statistical Society. Series B (Methodological), 27(2):169–203.
  20. The foundations of decision under uncertainty: An elementary exposition. Journal of the American Statistical Association, 59(306):353–375.
  21. Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688–701.
  22. Rubin, D. B. (1980). Randomization analysis of experimental data: The Fisher randomization test comment. Journal of the American Statistical Association, 75(371):591–593.
  23. On the foundations of statistical inference: Discussion. Journal of the American Statistical Association, 57(298):307–326.
  24. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288.
  25. Von Neumann, J. (1947). The mathematician. In Heywood, R., editor, Works of the Mind, volume I, pages 180–196. University of Chicago Press.
  26. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49–67.
  27. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2):301–320.
Citations (5)

Summary

We haven't generated a summary for this paper yet.