DESCN: Deep Entire Space Cross Networks for Individual Treatment Effect Estimation (2207.09920v3)
Abstract: Causal Inference has wide applications in various areas such as E-commerce and precision medicine, and its performance heavily relies on the accurate estimation of the Individual Treatment Effect (ITE). Conventionally, ITE is predicted by modeling the treated and control response functions separately in their individual sample spaces. However, such an approach usually encounters two issues in practice, i.e. divergent distribution between treated and control groups due to treatment bias, and significant sample imbalance of their population sizes. This paper proposes Deep Entire Space Cross Networks (DESCN) to model treatment effects from an end-to-end perspective. DESCN captures the integrated information of the treatment propensity, the response, and the hidden treatment effect through a cross network in a multi-task learning manner. Our method jointly learns the treatment and response functions in the entire sample space to avoid treatment bias and employs an intermediate pseudo treatment effect prediction network to relieve sample imbalance. Extensive experiments are conducted on a synthetic dataset and a large-scaled production dataset from the E-commerce voucher distribution business. The results indicate that DESCN can successfully enhance the accuracy of ITE estimation and improve the uplift ranking performance. A sample of the production dataset and the source code are released to facilitate future research in the community, which is, to the best of our knowledge, the first large-scale public biased treatment dataset for causal inference.
- Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E 64, 6 (Nov. 2001). https://doi.org/10.1103/physreve.64.061907
- Bart: Bayesian additive regression trees. The Annals of Applied Statistics 4, 1 (2010), 266–298. https://doi.org/10.1214/09-aoas285
- Marco Cuturi and Arnaud Doucet. 2014. Fast Computation of Wasserstein Barycenters. arXiv:1310.4375 [stat.ML]
- A Kernel Two-Sample Test. Journal of Machine Learning Research 13, 25 (2012), 723–773. http://jmlr.org/papers/v13/gretton12a.html
- A decision support framework to implement optimal personalized marketing intervention. Decision Support Systems (Elsevier) 72 (april 2015), 24–32. https://doi.org/10.1016/j.dss.2015.01.010
- Pierre Gutierrez and Jean-Yves Gerardy. 2016. Causal Inference and Uplift Modeling A review of the literature. JMLR: Workshop and Conference Proceedings 67 (2016).
- Pierre Gutierrez and Jean-Yves Gérardy. 2016. Causal Inference and Uplift Modelling: A Review of the Literature. PAPIs (July 2016).
- JMaciej Ja´skowski and Szymon Jaroszewicz. 2012. Uplift modeling for clinical trial data. ICML 2012 Workshop on clinical data analysis (2012).
- Learning representations for counterfactual inference. In International conference on machine learning. PMLR, 3020–3029.
- Meta-learners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences 116, 10 (Feb. 2019), 4156–4165. https://doi.org/10.1073/pnas.1804597116
- Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate. The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (2018).
- Paul R Rosenbaum and Donald B Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 1 (1983), 41–55.
- Donald B. Rubin. 2005. Causal Inference Using Potential Outcomes. J. Amer. Statist. Assoc. 100 (2005), 322 – 331.
- Piotr Rzepakowski and Szymon Jaroszewicz. 2012a. Decision trees for uplift modeling with single and multiple treatments. Knowl. Inf. Syst. 32, 2 (Aug. 2012), 303–327. https://doi.org/10.1007/s10115-011-0434-0
- Piotr Rzepakowski and Szymon Jaroszewicz. 2012b. Decision trees for uplift modeling with single and multiple treatments. Knowl. Inf. Syst. 32, 2 (Aug. 2012), 303–327.
- Learning Counterfactual Representations for Estimating Individual Dose-Response Curves. Proceedings of the AAAI Conference on Artificial Intelligence 34, 04 (April 2020), 5612–5619. https://doi.org/10.1609/aaai.v34i04.6014
- Perfect Match: A Simple Method for Learning Representations For Counterfactual Inference With Neural Networks. (2019).
- Estimating individual treatment effect: generalization bounds and algorithms. Proceedings of the 34th International Conference on Machine Learning V1 (Aug. 2017), 3076–3085.
- Adapting Neural Networks for the Estimation of Treatment Effects. Advances in Neural Information Processing Systems (2019).
- Ensemble methods for uplift modeling. Data Min. Knowl. Discov. 29, 6 (Nov. 2015), 1531–1559.
- Tibshirani. 1996. Regression shrinkage and selection via the lasso. Statist (1996), 267–288.
- Cédric Villani. 2008. Optimal Transport: Old and New.
- Stefan Wager and Susan Athey. 2018. Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Statist. Assoc. 113, 523 (June 2018), 1228–1242. https://doi.org/10.1080/01621459.2017.1319839
- Representation Learning for Treatment Effect Estimation from Observational Data. Advances in Neural Information Processing Systems (2018), 2638–2648.
- Uplift Modeling with Multiple Treatments and General Response Types. (May 2017). arXiv:1705.08492 [cs.AI]
- Zhenyu Zhao and Totte Harinen. 2019. Uplift Modeling for Multiple Treatments with Cost Optimization. 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (Jan. 2019), 422–431. https://doi.org/10.1109/DSAA.2019.00057
- Feature Selection Methods for Uplift Modeling. Computer Science, Mathematics (May 2020). https://arxiv.org/abs/2005.03447
- Kailiang Zhong (1 paper)
- Fengtong Xiao (2 papers)
- Yan Ren (19 papers)
- Yaorong Liang (1 paper)
- Wenqing Yao (2 papers)
- Xiaofeng Yang (154 papers)
- Ling Cen (1 paper)