Zero-shot causal learning (2301.12292v4)
Abstract: Predicting how different interventions will causally affect a specific individual is important in a variety of domains such as personalized medicine, public policy, and online marketing. There are a large number of methods to predict the effect of an existing intervention based on historical data from individuals who received it. However, in many settings it is important to predict the effects of novel interventions (e.g., a newly invented drug), which these methods do not address. Here, we consider zero-shot causal learning: predicting the personalized effects of a novel intervention. We propose CaML, a causal meta-learning framework which formulates the personalized prediction of each intervention's effect as a task. CaML trains a single meta-model across thousands of tasks, each constructed by sampling an intervention, its recipients, and its nonrecipients. By leveraging both intervention information (e.g., a drug's attributes) and individual features~(e.g., a patient's history), CaML is able to predict the personalized effects of novel interventions that do not exist at the time of training. Experimental results on real world datasets in large-scale medical claims and cell-line perturbations demonstrate the effectiveness of our approach. Most strikingly, \method's zero-shot predictions outperform even strong baselines trained directly on data from the test interventions.
- Ahmed M Alaa and Mihaela Van Der Schaar. Bayesian inference of individualized treatment effects using multi-task gaussian processes. Advances in neural information processing systems, 30, 2017.
- Large-scale diet tracking data reveal disparate associations between food environment and diet. Nature communications, 13(1):267, 2022.
- Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27):7353–7360, 2016.
- Learning a synaptic learning rule. Citeseer, 1990.
- From real-world patient data to individualized treatment effects using machine learning: current and future methods to address underlying challenges. Clinical Pharmacology & Therapeutics, 109(1):87–100, 2021.
- Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
- Olivier Bousquet. A bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Mathematique, 334(6):495–500, 2002.
- Building a knowledge graph to enable precision medicine. bioRxiv, 2022.
- Dimension-free log-sobolev inequalities for mixture distributions. Journal of Functional Analysis, 281(11):109236, 2021.
- Double/debiased machine learning for treatment and structural parameters, 2018a.
- Generic machine learning inference on heterogeneous treatment effects in randomized experiments, with an application to immunization in india. Technical report, National Bureau of Economic Research, 2018b.
- The measure and mismeasure of fairness. J. Mach. Learn. Res, 2023.
- Nonparametric tests for treatment effect heterogeneity. The Review of Economics and Statistics, 90(3):389–405, 2008.
- Alicia Curth and Mihaela van der Schaar. Doing great at estimating cate? on the neglected assumptions in benchmark comparisons of treatment effect estimators. arXiv preprint arXiv:2107.13346, 2021a.
- Alicia Curth and Mihaela van der Schaar. On inductive biases for heterogeneous treatment effect estimation. Advances in Neural Information Processing Systems, 34:15883–15894, 2021b.
- Alicia Curth and Mihaela van der Schaar. Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms. In International Conference on Artificial Intelligence and Statistics, pages 1810–1818. PMLR, 2021c.
- Really doing great at estimating cate? a critical look at ml benchmarking practices in treatment effect estimation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- Explanation-based learning: An alternative view. Machine learning, 1986. URL https://link.springer.com/content/pdf/10.1007/BF00114116.pdf.
- Lincs canvas browser: interactive web app to query, browse and interrogate lincs l1000 gene expression signatures. Nucleic acids research, 42(W1):W449–W460, 2014.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
- Graph-based clinical recommender: Predicting specialists procedure orders using graph representation learning. Journal of Biomedical Informatics, page 104407, 2023.
- Estimating individual treatment effects under unobserved confounding using binary instruments. arXiv preprint arXiv:2208.08544, 2022.
- Next-generation characterization of the cancer cell line encyclopedia. Nature, 569(7757):503–508, May 2019.
- Modeling heterogeneous treatment effects in survey experiments with bayesian additive regression trees. Public opinion quarterly, 76(3):491–511, 2012.
- Ehr foundation models improve robustness in the presence of temporal distribution shift. medRxiv, 2022.
- Graphite: Estimating individual effects of graph-structured treatments. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 659–668, 2021.
- Counterfactual regression with importance sampling weights. In IJCAI, pages 5880–5887, 2019a.
- Learning disentangled representations for counterfactual regression. In International Conference on Learning Representations, 2019b.
- The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
- Predicting single-cell perturbation responses for unseen drugs. arXiv preprint arXiv:2204.13545, 2022.
- Jennifer L Hill. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1):217–240, 2011.
- Sustained effects of high participation in an early intervention for low-birth-weight premature infants. Developmental psychology, 39(4):730, 2003.
- Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
- Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, 2015.
- Learning representations for counterfactual inference. In International conference on machine learning, pages 3020–3029. PMLR, 2016.
- Causal effect inference for structured treatments. Advances in Neural Information Processing Systems, 34:24841–24854, 2021.
- Edward H Kennedy. Optimal doubly robust estimation of heterogeneous causal effects. arXiv preprint arXiv:2004.14497, 2020a.
- Edward H Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects (2020). URL https://arxiv. org/abs, 2020b.
- Pancytopenia–a clinico haematological study of 200 cases. Indian journal of pathology & microbiology, 45(3):375–379, 2002.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning (ICML), 2021. URL https://arxiv.org/abs/2012.07421.
- Heterogeneous treatment effect with trained kernels of the nadaraya-watson regression. arXiv preprint arXiv:2207.09139, 2022.
- N Kostantinos. Gaussian mixtures and their applications to signal processing. Advanced signal processing handbook: theory and implementation for radar, sonar, and medical imaging real time systems, pages 3–1, 2000.
- The sider database of drugs and side effects. Nucleic acids research, 44(D1):D1075–D1079, 2016.
- Pancytopenia–a six year study. The Journal of the Association of Physicians of India, 49:1078–1081, 2001.
- Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116(10):4156–4165, 2019.
- Sharp constants in the poincaré, steklov and related inequalities (a survey). Mathematika, 61(2):328–344, 2015.
- Greg Landrum et al. Rdkit: Open-source cheminformatics. 2006.
- Michel Ledoux. Concentration of measure and logarithmic sobolev inequalities. In Seminaire de probabilites XXXIII, pages 120–216. Springer, 1999.
- Stargraph: A coarse-to-fine representation method for large-scale knowledge graph, 2022a. URL https://arxiv.org/abs/2205.14209.
- Graph representation learning in biomedicine and healthcare. Nature Biomedical Engineering, pages 1–17, 2022b.
- Multi-cause effect estimation with disentangled confounder representation. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021.
- Generalization error bounds for bayesian mixture algorithms. Journal of Machine Learning Research, 4(Oct):839–860, 2003.
- Counterfactuals and causal inference. Cambridge University Press, 2015.
- Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999, 2(3):4, 2018.
- On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
- Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299–319, 2021.
- On the chi square and higher-order chi distances for approximating f-divergences. IEEE Signal Processing Letters, 21(1):10–13, 2013.
- Leveraging quality prediction models for automatic writing feedback. In Twelfth International AAAI Conference on Web and Social Media, 2018.
- Causal conceptions of fairness and their consequences. In International Conference on Machine Learning, pages 16848–16887. PMLR, 2022.
- Human mobility networks reveal increased segregation in large cities. Nature, pages 1–7, 2023.
- An optimal poincaré inequality for convex domains. Archive for Rational Mechanics and Analysis, 5(1):286–292, 1960.
- Henri Poincaré. Sur les équations aux dérivées partielles de la physique mathématique. American Journal of Mathematics, pages 211–294, 1890.
- Estimating multi-cause treatment effects via single-cause perturbation. Advances in Neural Information Processing Systems, 34:23754–23767, 2021.
- Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157, 2019.
- An embarrassingly simple approach to zero-shot learning. In International conference on machine learning, pages 2152–2161. PMLR, 2015.
- Gears: Predicting transcriptional outcomes of novel multi-gene perturbations. bioRxiv, 2022.
- Multiple treatment effect estimation using deep generative model with task embedding. In The World Wide Web Conference, pages 1601–1611, 2019.
- Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Advances in neural information processing systems, 29, 2016.
- André Schlichting. Poincaré and log–sobolev inequalities for mixtures. Entropy, 21(1):89, 2019.
- Jürgen Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook. PhD thesis, Technische Universität München, 1987.
- Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature medicine, 27(12):2176–2182, 2021.
- Estimating individual treatment effect: generalization bounds and algorithms. In International Conference on Machine Learning, pages 3076–3085. PMLR, 2017.
- Metaci: Meta-learning for causal inference in a heterogeneous population. arXiv preprint arXiv:1912.03960, 2019.
- Adapting neural networks for the estimation of treatment effects. Advances in neural information processing systems, 32, 2019.
- Benchmarking framework for performance-evaluation of causal inference analysis. arXiv preprint arXiv:1802.05046, 2018.
- A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell, 171(6):1437–1452.e17, November 2017.
- Data-driven prediction of drug effects and interactions. Science translational medicine, 4(125):125ra31–125ra31, 2012.
- Learning to learn. Springer Science & Business Media, 2012.
- Meta-analysis of randomized experiments with applications to heavy-tailed response data. arXiv preprint arXiv:2112.07602, 2021.
- Vladimir Vapnik. Principles of risk minimization for learning theory. Advances in neural information processing systems, 4, 1991.
- Adapting text embeddings for causal inference. In Conference on Uncertainty in Artificial Intelligence, pages 919–928. PMLR, 2020.
- Stefan Wager. Stats 361: Causal inference, 2020.
- Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242, 2018.
- A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1–37, 2019.
- The blessings of multiple causes. Journal of the American Statistical Association, 114(528):1574–1596, 2019.
- Adjusting for confounders with text: Challenges and an empirical evaluation framework for causal inference. In Proceedings of the International AAAI Conference on Web and Social Media, volume 16, pages 1109–1120, 2022.
- Zero-shot learning-the good, the bad and the ugly. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4582–4591, 2017.
- Evaluating treatment prioritization rules via rank-weighted average treatment effects. arXiv preprint arXiv:2111.07966, 2021.
- QA-GNN: Reasoning with language models and knowledge graphs for question answering. In North American Chapter of the Association for Computational Linguistics (NAACL), 2021. URL https://arxiv.org/abs/2104.06378.
- Shing-Tung Yau. Isoperimetric constants and the first eigenvalue of a compact riemannian manifold. In Annales Scientifiques de l’École Normale Supérieure, volume 8, pages 487–507, 1975.
- Ganite: Estimation of individualized treatment effects using generative adversarial nets. In International Conference on Learning Representations, 2018.
- Triplere: Knowledge graph embeddings via tripled relation vectors. arXiv preprint arXiv:2209.08271, 2022.
- Learning overlapping representations for the estimation of individualized treatment effects. In International Conference on Artificial Intelligence and Statistics, pages 1005–1014. PMLR, 2020.
- Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics, 34(13):i457–i466, 2018.