Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unveiling the Potential of Robustness in Selecting Conditional Average Treatment Effect Estimators (2402.18392v2)

Published 28 Feb 2024 in cs.LG, cs.AI, econ.EM, and stat.ML

Abstract: The growing demand for personalized decision-making has led to a surge of interest in estimating the Conditional Average Treatment Effect (CATE). Various types of CATE estimators have been developed with advancements in machine learning and causal inference. However, selecting the desirable CATE estimator through a conventional model validation procedure remains impractical due to the absence of counterfactual outcomes in observational data. Existing approaches for CATE estimator selection, such as plug-in and pseudo-outcome metrics, face two challenges. First, they must determine the metric form and the underlying machine learning models for fitting nuisance parameters (e.g., outcome function, propensity function, and plug-in learner). Second, they lack a specific focus on selecting a robust CATE estimator. To address these challenges, this paper introduces a Distributionally Robust Metric (DRM) for CATE estimator selection. The proposed DRM is nuisance-free, eliminating the need to fit models for nuisance parameters, and it effectively prioritizes the selection of a distributionally robust CATE estimator. The experimental results validate the effectiveness of the DRM method in selecting CATE estimators that are robust to the distribution shift incurred by covariate shift and hidden confounders.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. When should you adjust standard errors for clustering? The Quarterly Journal of Economics, 138(1):1–35, 2023.
  2. Mostly harmless simulations? using monte carlo studies for estimator selection. Journal of Applied Econometrics, 34(6):893–910, 2019.
  3. Validating causal inference models via influence functions. In International Conference on Machine Learning, pp. 191–201. PMLR, 2019.
  4. Counterfactual representation learning with balancing weights. In International Conference on Artificial Intelligence and Statistics, pp.  1972–1980. PMLR, 2021.
  5. Generalized random forests. The Annals of Statistics, 47(2):1148–1178, 2019.
  6. Using wasserstein generative adversarial networks for the design of monte carlo simulations. Journal of Econometrics, 2021.
  7. Transfer learning on heterogeneous feature spaces for treatment effects estimation. Advances in Neural Information Processing Systems, 35:37184–37198, 2022.
  8. From real-world patient data to individualized treatment effects using machine learning: current and future methods to address underlying challenges. Clinical Pharmacology & Therapeutics, 109(1):87–100, 2021.
  9. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14(11), 2013.
  10. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp.  785–794, 2016.
  11. Double/debiased machine learning for treatment and structural parameters, 2018.
  12. Graph infomax adversarial learning for treatment effect estimation with networked observational data. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp.  176–184, 2021.
  13. On inductive biases for heterogeneous treatment effect estimation. Advances in Neural Information Processing Systems, 34:15883–15894, 2021a.
  14. Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms. In International Conference on Artificial Intelligence and Statistics, pp.  1810–1818. PMLR, 2021b.
  15. In search of insights, not magic bullets: Towards demystification of the model selection dilemma in heterogeneous treatment effect estimation. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  6623–6642. PMLR, 23–29 Jul 2023.
  16. Really doing great at estimating cate? a critical look at ml benchmarking practices in treatment effect estimation. In Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 2), 2021.
  17. Correction to: Counterfactual inference for consumer choice across many product categories. Quantitative Marketing and Economics, 19(3-4):409–409, 2021.
  18. Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition. Statistical Science, 34(1):43–68, 2019.
  19. Farrell, M. H. Robust inference on average treatment effects with possibly more covariates than observations. Journal of Econometrics, 189(1):1–23, 2015.
  20. A comparison of methods for treatment assignment with an application to playlist generation. Information Systems Research, 34(2):786–803, 2023.
  21. Orthogonal statistical learning. The Annals of Statistics, 51(3):879–908, 2023.
  22. Subgroup identification from randomized clinical trial data. Statistics in medicine, 30(24):2867–2880, 2011.
  23. A survey of learning causality with data: Problems and methods. ACM Computing Surveys (CSUR), 53(4):1–37, 2020.
  24. Ignite: A minimax game toward learning individual treatment effects from networked observational data. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 4534–4540, 2021.
  25. Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects (with discussion). Bayesian Analysis, 15(3):965–1056, 2020.
  26. Learning disentangled representations for counterfactual regression. In International Conference on Learning Representations, 2019.
  27. Hill, J. L. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1):217–240, 2011.
  28. Holland, P. W. Statistics and causal inference. Journal of the American statistical Association, 81(396):945–960, 1986.
  29. Kullback-leibler divergence constrained distributionally robust optimization. Available at Optimization Online, 1(2):9, 2013.
  30. The causal learning of retail delinquency. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  204–212, 2021.
  31. Towards balanced representation learning for credit policy evaluation. In International Conference on Artificial Intelligence and Statistics, pp.  3677–3692. PMLR, 2023.
  32. Learning representations for counterfactual inference. In International conference on machine learning, pp. 3020–3029. PMLR, 2016.
  33. Generalization bounds and representation learning for estimation of potential outcomes and causal effects. The Journal of Machine Learning Research, 23(1):7489–7538, 2022.
  34. Kennedy, E. H. Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics, 17(2):3008–3049, 2023.
  35. Adcb: An alzheimer’s disease simulator for benchmarking observational estimators of causal effects. In Conference on Health, Inference, and Learning, pp. 103–118. PMLR, 2022.
  36. Who should be treated? empirical welfare maximization methods for treatment choice. Econometrica, 86(2):591–616, 2018.
  37. Treatment effect estimation with data-driven variable decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
  38. Data-driven variable decomposition for treatment effect estimation. IEEE Transactions on Knowledge and Data Engineering, 34(5):2120–2134, 2020.
  39. Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations research & management science in the age of analytics, pp.  130–166. Informs, 2019.
  40. Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116(10):4156–4165, 2019.
  41. Large-scale methods for distributionally robust optimization. Advances in Neural Information Processing Systems, 33:8847–8860, 2020.
  42. Random graph asymptotics for treatment effect estimation under network interference. The Annals of Statistics, 50(4):2334–2358, 2022.
  43. Causal effect inference with deep latent-variable models. Advances in neural information processing systems, 30, 2017.
  44. Understanding distributional ambiguity via non-robust chance constraint. In Proceedings of the First ACM International Conference on AI in Finance, pp.  1–8, 2020.
  45. Empirical analysis of model selection for heterogenous causal effect estimation. International Conference on Learning Representations, 2024.
  46. Data-driven distributionally robust optimization using the wasserstein metric: performance guarantees and tractable reformulations. Mathematical Programming, 171(1-2):115–166, 2018.
  47. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299–319, 2021.
  48. Methods and tools for causal discovery and causal inference. Wiley interdisciplinary reviews: data mining and knowledge discovery, 12(2):e1449, 2022.
  49. Bias reduction and metric learning for nearest-neighbor estimation of kullback-leibler divergence. In Artificial Intelligence and Statistics, pp.  669–677. PMLR, 2014.
  50. Orthogonal random forest for causal inference. In International Conference on Machine Learning, pp. 4932–4941. PMLR, 2019.
  51. Validating causal inference methods. In International Conference on Machine Learning, pp. 17346–17358. PMLR, 2022.
  52. Pflug, G. C. Multistage stochastic decision problems: Approximation by recursive structures and ambiguity modeling. European Journal of Operational Research, 306(3):1027–1039, 2023.
  53. Synctwin: Treatment effect estimation with longitudinal outcomes. Advances in Neural Information Processing Systems, 34:3178–3190, 2021.
  54. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55, 1983.
  55. Rubin, D. B. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322–331, 2005.
  56. Synth-validation: Selecting the best causal inference method for a given dataset. arXiv preprint arXiv:1711.00083, 2017.
  57. A comparison of methods for model selection when estimating individual treatment effects. arXiv preprint arXiv:1804.05146, 2018.
  58. Estimating individual treatment effect: generalization bounds and algorithms. In International conference on machine learning, pp. 3076–3085. PMLR, 2017.
  59. Adapting neural networks for the estimation of treatment effects. Advances in neural information processing systems, 32, 2019.
  60. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242, 2018.
  61. Sinkhorn distributionally robust optimization. arXiv preprint arXiv:2109.11926, 2021.
  62. A nearest-neighbor approach to estimating divergence between continuous random vectors. In 2006 IEEE International Symposium on Information Theory, pp.  242–246. IEEE, 2006.
  63. Learning decomposed representations for treatment effect estimation. IEEE Transactions on Knowledge and Data Engineering, 35(5):4989–5001, 2022.
  64. Representation learning for treatment effect estimation from observational data. Advances in neural information processing systems, 31, 2018.
  65. A survey on causal inference. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(5):1–46, 2021.
  66. Ganite: Estimation of individualized treatment effects using generative adversarial nets. In International conference on learning representations, 2018.
  67. The medical deconfounder: assessing treatment effects with electronic health records. In Machine Learning for Healthcare Conference, pp.  490–512. PMLR, 2019.
  68. Learning overlapping representations for the estimation of individualized treatment effects. In International Conference on Artificial Intelligence and Statistics, pp.  1005–1014. PMLR, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yiyan Huang (9 papers)
  2. Cheuk Hang Leung (12 papers)
  3. Siyi Wang (41 papers)
  4. Yijun Li (56 papers)
  5. Qi Wu (324 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com