M$^3$TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment Network for Uplift Modeling (2401.14426v1)
Abstract: Uplift modeling is a technique used to predict the effect of a treatment (e.g., discounts) on an individual's response. Although several methods have been proposed for multi-valued treatment, they are extended from binary treatment methods. There are still some limitations. Firstly, existing methods calculate uplift based on predicted responses, which may not guarantee a consistent uplift distribution between treatment and control groups. Moreover, this may cause cumulative errors for multi-valued treatment. Secondly, the model parameters become numerous with many prediction heads, leading to reduced efficiency. To address these issues, we propose a novel \underline{M}ulti-gate \underline{M}ixture-of-Experts based \underline{M}ulti-valued \underline{T}reatment \underline{N}etwork (M$3$TN). M$3$TN consists of two components: 1) a feature representation module with Multi-gate Mixture-of-Experts to improve the efficiency; 2) a reparameterization module by modeling uplift explicitly to improve the effectiveness. We also conduct extensive experiments to demonstrate the effectiveness and efficiency of our M$3$TN.
- “Explicit feature interaction-aware uplift network for online marketing,” arXiv preprint arXiv:2306.00315, 2023.
- “A unified survey of treatment effect heterogeneity modelling and uplift modelling,” ACM Computing Surveys (CSUR), vol. 54, no. 8, pp. 1–36, 2021.
- “Offline imitation learning with variational counterfactual reasoning,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- “Uplift modeling for location-based online advertising,” in Proceedings of the 3rd ACM SIGSPATIAL international workshop on location-based recommendations, geosocial networks and geoadvertising, 2019, pp. 1–4.
- “Robustness-enhanced uplift modeling with adversarial feature desensitization,” arXiv preprint arXiv:2310.04693, 2023.
- “Metalearners for estimating heterogeneous treatment effects using machine learning,” Proceedings of the national academy of sciences, vol. 116, no. 10, pp. 4156–4165, 2019.
- “Quasi-oracle estimation of heterogeneous treatment effects,” Biometrika, vol. 108, no. 2, pp. 299–319, 2021.
- “Decision trees for uplift modeling,” in 2010 IEEE International Conference on Data Mining. IEEE, 2010, pp. 441–450.
- “Uplift modeling with multiple treatments and general response types,” in Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 2017, pp. 588–596.
- “Estimation and inference of heterogeneous treatment effects using random forests,” Journal of the American Statistical Association, vol. 113, no. 523, pp. 1228–1242, 2018.
- Michael Lechner, “Modified causal forests for estimating heterogeneous causal effects,” arXiv preprint arXiv:1812.09487, 2018.
- “Gcf: Generalized causal forest for heterogeneous treatment effect estimation in online marketplace,” arXiv preprint arXiv:2203.10975, 2022.
- “Estimating individual treatment effect: generalization bounds and algorithms,” in International conference on machine learning. PMLR, 2017, pp. 3076–3085.
- “Adapting neural networks for the estimation of treatment effects,” Advances in neural information processing systems, vol. 32, 2019.
- “Memento: Neural model for estimating individual treatment effects for multiple treatments,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 3381–3390.
- “Hydranet: A neural network for the estimation of multi-valued treatment effects,” in NeurIPS 2022 Workshop on Causality for Real-world Impact, 2022.
- Donald B Rubin, “Causal inference using potential outcomes: Design, modeling, decisions,” Journal of the American Statistical Association, vol. 100, no. 469, pp. 322–331, 2005.
- “Learning factored representations in a deep mixture of experts,” arXiv preprint arXiv:1312.4314, 2013.
- “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” arXiv preprint arXiv:1701.06538, 2017.
- “The costs of low birth weight,” The Quarterly Journal of Economics, vol. 120, no. 3, pp. 1031–1083, 2005.
- “Qini-based uplift regression,” The Annals of Applied Statistics, vol. 15, no. 3, pp. 1247–1272, 2021.
- “Optuna: A next-generation hyperparameter optimization framework,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 2623–2631.
- SS Vallender, “Calculation of the wasserstein distance between probability distributions on the line,” Theory of Probability & Its Applications, vol. 18, no. 4, pp. 784–786, 1974.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.