Geometry-Aware Normalizing Wasserstein Flows for Optimal Causal Inference (2311.18826v4)
Abstract: This paper presents a groundbreaking approach to causal inference by integrating continuous normalizing flows (CNFs) with parametric submodels, enhancing their geometric sensitivity and improving upon traditional Targeted Maximum Likelihood Estimation (TMLE). Our method employs CNFs to refine TMLE, optimizing the Cram\'er-Rao bound and transitioning from a predefined distribution $p_0$ to a data-driven distribution $p_1$. We innovate further by embedding Wasserstein gradient flows within Fokker-Planck equations, thus imposing geometric structures that boost the robustness of CNFs, particularly in optimal transport theory. Our approach addresses the disparity between sample and population distributions, a critical factor in parameter estimation bias. We leverage optimal transport and Wasserstein gradient flows to develop causal inference methodologies with minimal variance in finite-sample settings, outperforming traditional methods like TMLE and AIPW. This novel framework, centered on Wasserstein gradient flows, minimizes variance in efficient influence functions under distribution $p_t$. Preliminary experiments showcase our method's superiority, yielding lower mean-squared errors compared to standard flows, thereby demonstrating the potential of geometry-aware normalizing Wasserstein flows in advancing statistical modeling and inference.
- Estimating the spectral density of large implicit matrices. arXiv preprint arXiv:1802.03451.
- Stochastic interpolants: A unifying framework for flows and diffusions. arXiv preprint arXiv:2303.08797.
- Stochastic interpolants with data-dependent couplings. arXiv preprint arXiv:2310.03725.
- Building normalizing flows with stochastic interpolants. arXiv preprint arXiv:2209.15571.
- Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media.
- A computational fluid mechanics solution to the monge-kantorovich mass transfer problem. Numerische Mathematik, 84(3):375–393.
- Robust wasserstein profile inference and applications to machine learning. Journal of Applied Probability, 56(3):830–857.
- Physics-informed neural networks (pinns) for fluid mechanics: A review. Acta Mechanica Sinica, 37(12):1727–1738.
- C-learner: Constrained learning for causal inference. In Conference on Digital Experimentation.
- Learning neural event functions for ordinary differential equations. arXiv preprint arXiv:2011.03902.
- Neural ordinary differential equations. Advances in neural information processing systems, 31.
- Double/debiased machine learning for treatment and structural parameters.
- Locally robust semiparametric estimation. Econometrica, 90(4):1501–1535.
- Distributionally robust losses for latent covariate mixtures. Operations Research, 71(2):649–664.
- Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. arXiv preprint arXiv:1505.05116.
- Variational wasserstein gradient flow. arXiv preprint arXiv:2112.02424.
- How to train your neural ode: the world of jacobian and kinetic regularization. In International conference on machine learning, pages 3154–3164. PMLR.
- Distributionally robust stochastic optimization with wasserstein distance. Mathematics of Operations Research, 48(2):603–655.
- Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367.
- Hutchinson, M. F. (1989). A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18(3):1059–1076.
- The variational formulation of the fokker–planck equation. SIAM journal on mathematical analysis, 29(1):1–17.
- Kennedy, E. H. (2022). Semiparametric doubly robust targeted double machine learning: a review. arXiv preprint arXiv:2203.06469.
- Flow matching for generative modeling. arXiv preprint arXiv:2210.02747.
- Neural parametric fokker–planck equation. SIAM Journal on Numerical Analysis, 60(3):1385–1449.
- Decomposition algorithm for distributionally robust optimization using wasserstein metric with an application to a class of regression models. European Journal of Operational Research, 278(1):20–35.
- Learning from uncertain curves: The 2-wasserstein metric for gaussian processes. Advances in Neural Information Processing Systems, 30.
- Variance-based regularization with convex objectives. Advances in neural information processing systems, 30.
- Statistical aspects of wasserstein distances. Annual review of statistics and its application, 6:405–431.
- Ambiguity in portfolio selection. Quantitative Finance, 7(4):435–442.
- Pontryagin, L. S. (2018). Mathematical theory of optimal processes. Routledge.
- Frameworks and results in distributionally robust optimization. Open Journal of Mathematical Optimization, 3:1–85.
- Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707.
- Certifying some distributional robustness with principled adversarial training. arXiv preprint arXiv:1710.10571.
- Takatsu, A. (2011). Wasserstein geometry of gaussian measures.
- Tsiatis, A. A. (2006). Semiparametric theory and missing data.
- Targeted maximum likelihood learning. The international journal of biostatistics, 2(1).
- Van der Vaart, A. W. (2000). Asymptotic statistics, volume 3. Cambridge university press.
- Villani, C. (2021). Topics in optimal transportation, volume 58. American Mathematical Soc.