Double Cross-fit Doubly Robust Estimators: Beyond Series Regression (2403.15175v2)
Abstract: Doubly robust estimators with cross-fitting have gained popularity in causal inference due to their favorable structure-agnostic error guarantees. However, when additional structure, such as H\"{o}lder smoothness, is available then more accurate "double cross-fit doubly robust" (DCDR) estimators can be constructed by splitting the training data and undersmoothing nuisance function estimators on independent samples. We study a DCDR estimator of the Expected Conditional Covariance, a functional of interest in causal inference and conditional independence testing, and derive a series of increasingly powerful results with progressively stronger assumptions. We first provide a structure-agnostic error analysis for the DCDR estimator with no assumptions on the nuisance functions or their estimators. Then, assuming the nuisance functions are H\"{o}lder smooth, but without assuming knowledge of the true smoothness level or the covariate density, we establish that DCDR estimators with several linear smoothers are semiparametric efficient under minimal conditions and achieve fast convergence rates in the non-$\sqrt{n}$ regime. When the covariate density and smoothnesses are known, we propose a minimax rate-optimal DCDR estimator based on undersmoothed kernel regression. Moreover, we show an undersmoothed DCDR estimator satisfies a slower-than-$\sqrt{n}$ central limit theorem, and that inference is possible even in the non-$\sqrt{n}$ regime. Finally, we support our theoretical results with simulations, providing intuition for double cross-fitting and undersmoothing, demonstrating where our estimator achieves semiparametric efficiency while the usual "single cross-fit" estimator fails, and illustrating asymptotic normality for the undersmoothed DCDR estimator.
- The fundamental limits of structure-agnostic functional estimation. arXiv preprint arXiv:2305.04116, 2023.
- Some new asymptotic theory for least squares series: Pointwise and uniform results. Journal of Econometrics, 186(2):345–366, 2015.
- The berry-esseen bound for student’s statistic. The Annals of Probability, 24(1):491–503, 1996.
- Lectures on the nearest neighbor method. Cham: Springer, 2015.
- Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68, 2018.
- David R Cox. A note on data-splitting for the evaluation of significance levels. Biometrika, 62(2):441–444, 1975.
- Nearest Neighbor Classification and Search, chapter 18, pages 403–423. Cambridge University Press, Cambridge, 2021.
- Decoupling of u-statistics and u-processes. Decoupling: From Dependence to Independence, pages 97–152, 1999.
- High-dimensional inference: confidence intervals, p-values and r-software hdi. Statistical science, pages 533–558, 2015.
- Iván Díaz. Non-agency interventions for causal mediation in the presence of intermediate confounding. arXiv preprint arXiv:2205.08000, 2023.
- Rick Durrett. Probability: theory and examples. Cambridge university press, Cambridge, UK; New York, NY, 2019.
- Local polynomial modelling and its applications. Routledge, New York, NY, 2018.
- Three-way cross-fitting and pseudo-outcome regression for estimation of conditional effects and other linear functionals. arXiv preprint arXiv:2306.07230, 2023.
- A simple adaptive estimator of the integrated square of a density. Bernoulli, 14(1), 2008a.
- Uniform central limit theorems for kernel density estimators. Probability Theory and Related Fields, 141(3-4):333–387, 2008b.
- Mathematical foundations of infinite-dimensional statistical models. Cambridge university press, Cambridge, UK, 2021.
- Exponential and moment inequalities for u-statistics. In High Dimensional Probability II, pages 13–38. Springer, Boston, MA, 2000.
- A distribution-free theory of nonparametric regression, volume 1. New York: Springer, 2002.
- Bruce E. Hansen. Econometrics. Princeton University Press, Princeton, NJ, 2022.
- John A Hartigan. Using subsample values as typical values. Journal of the American Statistical Association, 64(328):1303–1317, 1969.
- Edward H Kennedy. Semiparametric doubly robust targeted double machine learning: a review. arXiv preprint arXiv:2203.06469, 2022.
- Edward H Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics, 17(2):3008–3049, 2023.
- Dimension-agnostic inference using cross u-statistics. Bernoulli, 30(1):683–711, 2024.
- Higher order inference on a treatment effect under low regularity conditions. Statistics & Probability Letters, 81(7):821–828, 2011.
- New n𝑛\sqrt{n}square-root start_ARG italic_n end_ARG-consistent, numerically stable higher-order influence function estimators. arXiv preprint arXiv:2302.08097, 2023.
- On Nearly Assumption-Free Tests of Nominal Confidence Interval Coverage for Causal Parameters Estimated by Machine Learning. Statistical Science, 35(3):518–539, 2020.
- Adaptive estimation of nonparametric functionals. The Journal of Machine Learning Research, 22(1):4507–4572, 2021.
- Elias Masry. Multivariate regression estimation local polynomial fitting for time series. Stochastic Processes and their Applications, 65(1):81–101, 1996.
- Nonparametric estimation of conditional incremental effects. arXiv preprint arXiv:2212.03578, 2022.
- On undersmoothing and sample splitting for estimating a doubly robust functional. arXiv preprint arXiv:2212.14857, 2022.
- Stability selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(4):417–473, 2010.
- Patrick AP Moran. Dividing a sample into two parts a statistical dilemma. Sankhyā: The Indian Journal of Statistics, Series A, pages 329–333, 1973.
- Cross-fitting and fast remainder rates for semiparametric estimation. arXiv preprint arXiv:1801.09138, 2018.
- Undersmoothing and bias corrected functional estimation. 1998.
- Undersmoothed kernel entropy estimators. IEEE Transactions on Information Theory, 54(9):4384–4388, 2008.
- Bootstrapping and sample splitting for high-dimensional, assumption-lean inference. The Annals of Statistics, 47(6):3438–3469, 2019.
- Higher order influence functions and minimax estimation of nonlinear functionals. In Institute of Mathematical Statistics Collections, pages 335–421. Institute of Mathematical Statistics, 2008.
- Semiparametric minimax rates. Electronic Journal of Statistics, 3:1305–1321, 2009.
- Asymptotic normality of quadratic estimators. Stochastic processes and their applications, 126(12):3733–3759, 2016.
- Minimax estimation of a functional on a structured high-dimensional model. The Annals of Statistics, 45(5), 2017.
- Characterization of parameters with a mixed bias property. Biometrika, 108(1):231–238, 2021.
- D. Ruppert and M. P. Wand. Multivariate locally weighted least squares regression. The Annals of Statistics, 22(3):1346–1370, 1994.
- David W Scott. Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons, Hoboken, NJ, 2015.
- The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics, 48(3):1514–1538, 2020.
- Joel A Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning, 8(1-2):1–230, 2015.
- Anastasios A Tsiatis. Semiparametric Theory and Missing Data. New York: Springer, 2006.
- Alexandre B Tsybakov. Introduction to Nonparametric Estimation. New York: Springer, 2009.
- Mark J van der Laan and James M Robins. Unified methods for censored longitudinal data and causality. New York: Springer, 2003.
- Efficient estimation of pathwise differentiable target parameters with the undersmoothed highly adaptive lasso. The International Journal of Biostatistics, 2022.
- Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. New York: Springer, 2011.
- Aad van der Vaart. Higher Order Tangent Spaces and Influence Functions. Statistical Science, 29(4):679–686, 2014.
- Aad W van der Vaart and Jon A Wellner. Weak Convergence and Empirical Processes. New York: Springer, 1996.
- High dimensional variable selection. Annals of statistics, 37(5A):2178, 2009.
- Wenjing Zheng and Mark J van der Laan. Asymptotic theory for cross-validated targeted maximum likelihood estimation. U.C. Berkeley Division of Biostatistics Working Paper Series, 2010.
- Marginal interventional effects. arXiv preprint arXiv:2206.10717, 2022.