Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Estimation Beyond Data Reweighting: Kernel Method of Moments (2305.10898v2)

Published 18 May 2023 in cs.LG and stat.ML

Abstract: Moment restrictions and their conditional counterparts emerge in many areas of machine learning and statistics ranging from causal inference to reinforcement learning. Estimators for these tasks, generally called methods of moments, include the prominent generalized method of moments (GMM) which has recently gained attention in causal inference. GMM is a special case of the broader family of empirical likelihood estimators which are based on approximating a population distribution by means of minimizing a $\varphi$-divergence to an empirical distribution. However, the use of $\varphi$-divergences effectively limits the candidate distributions to reweightings of the data samples. We lift this long-standing limitation and provide a method of moments that goes beyond data reweighting. This is achieved by defining an empirical likelihood estimator based on maximum mean discrepancy which we term the kernel method of moments (KMM). We provide a variant of our estimator for conditional moment restrictions and show that it is asymptotically first-order optimal for such problems. Finally, we show that our method achieves competitive performance on several conditional moment restriction tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. C. Ai and X. Chen. Efficient estimation of models with conditional moment restrictions containing unknown functions. Econometrica, 71(6):1795–1843, 2003.
  2. Mostly harmless econometrics. Princeton university press, 2008.
  3. Maximum Mean Discrepancy Gradient Flow. arXiv:1906.04370 [cs, stat], Dec. 2019. arXiv: 1906.04370.
  4. A. Bennett and N. Kallus. Efficient policy learning from surrogate-loss classification reductions. In International Conference on Machine Learning, pages 788–798. PMLR, 2020a.
  5. A. Bennett and N. Kallus. The variational method of moments, 2020b.
  6. Deep generalized method of moments for instrumental variable analysis. Advances in neural information processing systems, 32, 2019.
  7. Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders. In International Conference on Artificial Intelligence and Statistics, pages 1999–2007. PMLR, 2021.
  8. A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, 2011.
  9. D. S. Bernstein. Matrix mathematics. In Matrix Mathematics. Princeton university press, 2009.
  10. H. J. Bierens. Consistent model specification tests. Journal of Econometrics, 20(1):105–134, 1982.
  11. Robust wasserstein profile inference and applications to machine learning. Journal of Applied Probability, 56(3):830–857, 2019.
  12. M. Carrasco and J.-P. Florens. Generalization of gmm to a continuum of moment conditions. Econometric Theory, 16(6):797–834, 2000. ISSN 02664666, 14694360.
  13. M. Carrasco and R. Kotchoni. Regularized generalized empirical likelihood estimators. Technical report, Technical report, 2017.
  14. Efficient estimation of general dynamic models with a continuum of moment conditions. Journal of econometrics, 140(2):529–573, 2007.
  15. G. Chamberlain. Asymptotic efficiency in estimation with conditional moment restrictions. Journal of Econometrics, 34(3):305–334, 1987. ISSN 0304-4076. doi: https://doi.org/10.1016/0304-4076(87)90015-7.
  16. P. Chaussé. Generalized empirical likelihood for a continuum of moment conditions. 2012.
  17. X. Chen and D. Pouzo. Efficient estimation of semiparametric conditional moment models with possibly nonsmooth residuals. Journal of Econometrics, 152(1):46–60, 2009.
  18. On instrumental variable regression for deep offline policy evaluation. arXiv preprint arXiv:2105.10148, 2021.
  19. Double/debiased machine learning for treatment and causal parameters. arXiv preprint arXiv:1608.00060, 2016.
  20. Double/debiased/neyman machine learning of treatment effects. American Economic Review, 107(5):261–65, 2017.
  21. Double/debiased machine learning for treatment and structural parameters, 2018.
  22. L. Chizat. Mean-Field Langevin Dynamics: Exponential Convergence and Annealing, Aug. 2022. arXiv:2202.01009 [math].
  23. M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
  24. Training gans with optimism, 2018.
  25. Minimax estimation of conditional moment models. In Advances in Neural Information Processing Systems, volume 33, pages 12248–12262. Curran Associates, Inc., 2020.
  26. Statistics of robust optimization: A generalized empirical likelihood approach, 2018.
  27. Large sample analysis of the median heuristic, 2018.
  28. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  29. Measuring statistical dependence with hilbert-schmidt norms. In Algorithmic Learning Theory: 16th International Conference, ALT 2005, Singapore, October 8-11, 2005. Proceedings 16, pages 63–77. Springer, 2005.
  30. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
  31. L. P. Hansen. Large sample properties of generalized method of moments estimators. Econometrica, 50(4):1029–1054, 1982. ISSN 00129682, 14680262.
  32. Finite-sample properties of some alternative gmm estimators. Journal of Business & Economic Statistics, 14(3):262–280, 1996. ISSN 07350015.
  33. Deep iv: A flexible approach for counterfactual prediction. In International Conference on Machine Learning, pages 1414–1423. PMLR, 2017.
  34. S. He and H. Lam. Higher-order expansion and bartlett correctability of distributionally robust optimization. arXiv preprint arXiv:2108.05908, 2021.
  35. Denoising diffusion probabilistic models, 2020.
  36. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.
  37. The variational formulation of the Fokker–Planck equation. SIAM journal on mathematical analysis, 29(1):1–17, 1998. Publisher: SIAM.
  38. D. P. Kingma and M. Welling. Auto-encoding variational bayes, 2022.
  39. Empirical likelihood-based inference in conditional moment restriction models. Econometrica, 72(6):1667–1714, 2004. ISSN 00129682, 14680262.
  40. Functional generalized empirical likelihood estimation for conditional moment restrictions. In International Conference on Machine Learning, pages 11665–11682. PMLR, 2022.
  41. H. Lam. Recovering best statistical guarantees via the empirical divergence-based distributionally robust optimization. Operations Research, 67(4):1090–1105, 2019.
  42. H. Lam and H. Qian. Optimization-based quantification of simulation input uncertainty via empirical likelihood. arXiv preprint arXiv:1707.05917, 2017.
  43. H. Lam and E. Zhou. The empirical likelihood approach to quantifying uncertainty in sample average approximation. Operations Research Letters, 45(4):301–307, 2017.
  44. J. B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM Journal on optimization, 11(3):796–817, 2001.
  45. G. Lewis and V. Syrgkanis. Adversarial generalized method of moments, 2018.
  46. Non-parametric models for non-negative functions. Advances in neural information processing systems, 33:12816–12826, 2020.
  47. Universal kernels. Mathematics, 7, 12 2006.
  48. Regression by dependence minimization and its application to causal inference. page 94, 06 2009. doi: 10.1145/1553374.1553470.
  49. Kernel conditional moment test via maximum moment restriction, 2020.
  50. Y. Nesterov and A. Nemirovskii. Interior-point polynomial algorithms in convex programming. SIAM, 1994.
  51. Instrumental variable estimation of nonparametric models. Econometrica, 71(5):1565–1578, 2003.
  52. Higher order properties of gmm and generalized empirical likelihood estimators. Econometrica, 72(1):219–255, 2004. ISSN 00129682, 14680262.
  53. A. Owen. Empirical likelihood ratio confidence regions. The Annals of Statistics, 18(1):90–120, 1990. ISSN 00905364.
  54. A. B. Owen. Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2):237–249, 1988. ISSN 00063444.
  55. Normalizing flows for probabilistic modeling and inference. The Journal of Machine Learning Research, 22(1):2617–2680, 2021.
  56. J. Qin and J. Lawless. Empirical likelihood and general estimating equations. The Annals of Statistics, 22(1):300–325, 1994. ISSN 00905364.
  57. A. Rahimi and B. Recht. Random features for large-scale kernel machines. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007.
  58. M. Rosenblatt. Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, pages 832–837, 1956.
  59. Exploiting independent instruments: Identification and distribution generalization, 2022.
  60. B. Schölkopf and A. J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2002.
  61. C.-J. Simon-Gabriel and B. Schölkopf. Kernel distribution embeddings: Universal kernels, characteristic kernels and kernel metrics on distributions. Journal of Machine Learning Research, 19(44):1–29, 2018. URL http://jmlr.org/papers/v19/16-291.html.
  62. R. J. Smith. Alternative semi-parametric likelihood approaches to generalised method of moments estimation. The Economic Journal, 107(441):503–519, 1997.
  63. I. Steinwart and A. Christmann. Support vector machines. Springer Science & Business Media, 2008.
  64. G. Tripathi and Y. Kitamura. Testing conditional moment restrictions. The Annals of Statistics, 31(6):2059–2095, 2003.
  65. Learning deep features in instrumental variable regression. In International Conference on Learning Representations, 2021.
  66. E. Zeidler. Applied functional analysis: applications to mathematical physics, volume 108. Springer Science & Business Media, 2012.
  67. Maximum moment restriction for instrumental variable regression, 2021. arXiv 2010.07684.
  68. Kernel distributionally robust optimization: Generalized duality theorem and stochastic approximation. In International Conference on Artificial Intelligence and Statistics, pages 280–288. PMLR, 2021.
Citations (7)

Summary

We haven't generated a summary for this paper yet.