Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Identifying Representations for Intervention Extrapolation (2310.04295v2)

Published 6 Oct 2023 in cs.LG, cs.AI, and stat.ML

Abstract: The premise of identifiable and causal representation learning is to improve the current representation learning paradigm in terms of generalizability or robustness. Despite recent progress in questions of identifiability, more theoretical results demonstrating concrete advantages of these methods for downstream tasks are needed. In this paper, we consider the task of intervention extrapolation: predicting how interventions affect an outcome, even when those interventions are not observed at training time, and show that identifiable representations can provide an effective solution to this task even if the interventions affect the outcome non-linearly. Our setup includes an outcome Y, observed features X, which are generated as a non-linear transformation of latent features Z, and exogenous action variables A, which influence Z. The objective of intervention extrapolation is to predict how interventions on A that lie outside the training support of A affect Y. Here, extrapolation becomes possible if the effect of A on Z is linear and the residual when regressing Z on A has full support. As Z is latent, we combine the task of intervention extrapolation with identifiable representation learning, which we call Rep4Ex: we aim to map the observed features X into a subspace that allows for non-linear extrapolation in A. We show that the hidden representation is identifiable up to an affine transformation in Z-space, which is sufficient for intervention extrapolation. The identifiability is characterized by a novel constraint describing the linearity assumption of A on Z. Based on this insight, we propose a method that enforces the linear invariance constraint and can be combined with any type of autoencoder. We validate our theoretical findings through synthetic experiments and show that our approach succeeds in predicting the effects of unseen interventions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Weakly supervised representation learning with sparse perturbations. Advances in Neural Information Processing Systems, 35:15516–15528, 2022a.
  2. Towards efficient representation identification in supervised learning. In Conference on Causal Learning and Reasoning, pages 19–43. PMLR, 2022b.
  3. Interventional causal representation learning. In International Conference on Machine Learning, pages 372–407. PMLR, 2023.
  4. Kernels for vector-valued functions: A review. Foundations and Trends® in Machine Learning, 4(3):195–266, 2012.
  5. Identification of causal effects using instrumental variables. Journal of the American statistical Association, 91(434):444–455, 1996.
  6. Invariant risk minimization. ArXiv e-prints (1907.02893), 2019.
  7. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
  8. Intervention generalization: A view from factor graph models. arXiv preprint arXiv:2306.04027, 2023.
  9. Weakly supervised causal representation learning. Advances in Neural Information Processing Systems, 35:38319–38331, 2022.
  10. Learning linear causal representations from interventions under general nonlinear mixing. arXiv preprint arXiv:2306.02235, 2023.
  11. T. Bühler and D. A. Salamon. Functional analysis, volume 191. American Mathematical Society, 2018.
  12. Triad constraints for learning causal structure of latent variables. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  13. A causal framework for distribution generalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6614–6630, 2021.
  14. P. Constantinou and A. P. Dawid. Extended conditional independence and applications in causal inference. The Annals of Statistics, pages 2618–2653, 2017.
  15. X. D’Haultfoeuille. On the completeness condition in nonparametric instrumental problems. Econometric Theory, 27(3):460–471, 2011.
  16. Kernel choice and classifiability for rkhs embeddings of probability distributions. In Advances in Neural Information Processing Systems 22 (NeurIPS). Curran Associates, Inc., 2009.
  17. Boosted control functions. arXiv preprint arXiv:2310.05805, 2023.
  18. Deep learning. MIT press, 2016.
  19. Operationalizing complex causes: A pragmatic view of mediation. In International Conference on Machine Learning, pages 3875–3885. PMLR, 2021.
  20. Disentangling identifiable features from noisy data with structured nonlinear ica. Advances in Neural Information Processing Systems, 34:1624–1633, 2021.
  21. J. J. Heckman. Dummy endogenous variables in a simultaneous equation system. Technical report, National Bureau of Economic Research, 1977.
  22. A. Hyvarinen and H. Morioka. Unsupervised feature extraction by time-contrastive learning and nonlinear ica. Advances in neural information processing systems, 29, 2016.
  23. A. Hyvärinen and E. Oja. Independent component analysis: algorithms and applications. Neural networks, 13(4-5):411–430, 2000.
  24. A. Hyvärinen and P. Pajunen. Nonlinear independent component analysis: existence and uniqueness results. Neural Networks, 12(3):429–439, 1999.
  25. Nonlinear ica using auxiliary variables and generalized contrastive learning. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 859–868. PMLR, 2019.
  26. M. E. Jakobsen and J. Peters. Distributional robustness of K-class estimators and the PULSE. The Econometrics Journal, 25(2):404–432, 2022.
  27. Y. Jiang and B. Aragam. Learning nonparametric latent causal graphs with unknown interventions. arXiv preprint arXiv:2306.02899, 2023.
  28. Variational autoencoders and nonlinear ica: A unifying framework. In International Conference on Artificial Intelligence and Statistics, pages 2207–2217. PMLR, 2020.
  29. D. P. Kingma and M. Welling. Auto-encoding variational Bayes. In International Conference on Learning Representations, 2014.
  30. Learning latent causal graphs via mixture oracles. Advances in Neural Information Processing Systems, 34:18087–18101, 2021.
  31. Identifiability of deep generative models without auxiliary information. Advances in Neural Information Processing Systems, 35:15687–15701, 2022.
  32. Identification of nonlinear latent hierarchical models. arXiv preprint arXiv:2306.07916, 2023.
  33. M. A. Kramer. Nonlinear principal component analysis using autoassociative neural networks. AIChE journal, 37(2):233–243, 1991.
  34. Disentanglement via mechanism sparsity regularization: A new principle for nonlinear ica. In Conference on Causal Learning and Reasoning, pages 428–484. PMLR, 2022.
  35. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
  36. Identifiable deep generative models via sparse decoding. arXiv preprint arXiv:2110.10804, 2021.
  37. Kernel conditional moment test via maximum moment restriction. In Conference on Uncertainty in Artificial Intelligence, pages 41–50. PMLR, 2020.
  38. Estimating the effect of joint interventions from observational data in sparse high-dimensional settings. The Annals of Statistics, 45(2):647 – 674, 2017.
  39. Nonparametric estimation of triangular simultaneous equations models. Econometrica, 67(3):565–603, 1999.
  40. J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, USA, 2nd edition, 2009.
  41. Diffusion autoencoders: Toward a meaningful and decodable representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10619–10629, 2022.
  42. T. Richardson. Markov properties for acyclic directed mixed graphs. Scandinavian Journal of Statistics, 30(1):145–157, 2003.
  43. On linear identifiability of learned representations. In International Conference on Machine Learning, pages 9030–9039. PMLR, 2021.
  44. Invariant models for causal transfer learning. The Journal of Machine Learning Research, 19(1):1309–1342, 2018.
  45. The risks of invariant risk minimization. In International Conference on Learning Representations, volume 9, 2021.
  46. Domain-adjusted regression or: Erm may already learn features sufficient for out-of-distribution generalization. arXiv preprint arXiv:2202.06856, 2022.
  47. Anchor regression: Heterogeneous data meet causality. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(2):215–246, 2021.
  48. W. Rudin. Real and complex analysis, 3rd Edition. McGraw-Hill, 1987.
  49. S. Saengkyongam and R. Silva. Learning joint nonlinear effects from single-variable interventions in the presence of hidden confounders. In Conference on Uncertainty in Artificial Intelligence, pages 300–309. PMLR, 2020.
  50. Exploiting independent instruments: Identification and distribution generalization. In International Conference on Machine Learning, pages 18935–18958. PMLR, 2022.
  51. A. Schell and H. Oberhauser. Nonlinear independent component analysis for discrete-time and continuous-time signals. The Annals of Statistics, 51(2):487–518, 2023.
  52. Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021.
  53. Linear causal disentanglement via interventions. arXiv preprint arXiv:2211.16467, 2022.
  54. X. Shen and N. Meinshausen. Engression: Extrapolation for nonlinear regression? arXiv preprint arXiv:2307.00835, 2023.
  55. L. G. Telser. Iterative estimation of a set of linear regression equations. Journal of the American Statistical Association, 59(307):845–862, 1964.
  56. Score-based causal representation learning with interventions. arXiv preprint arXiv:2301.08230, 2023.
  57. Nonparametric identifiability of causal representations from unknown interventions. arXiv preprint arXiv:2306.00542, 2023.
  58. N. Wiener. Tauberian theorems. Annals of Mathematics, pages 1–100, 1932.
  59. P. G. Wright. The Tariff on Animal and Vegetable Oils. Investigations in International Commercial Policies. Macmillan, New York, NY, 1928.
  60. Identification of linear non-Gaussian latent hierarchical structure. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 24370–24387. PMLR, 2022.
  61. Identifiability guarantees for causal disentanglement from soft interventions. arXiv preprint arXiv:2307.06250, 2023.
Citations (10)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets