Efficient Conditionally Invariant Representation Learning (2212.08645v2)
Abstract: We introduce the Conditional Independence Regression CovariancE (CIRCE), a measure of conditional independence for multivariate continuous-valued variables. CIRCE applies as a regularizer in settings where we wish to learn neural features $\varphi(X)$ of data $X$ to estimate a target $Y$, while being conditionally independent of a distractor $Z$ given $Y$. Both $Z$ and $Y$ are assumed to be continuous-valued but relatively low dimensional, whereas $X$ and its features may be complex and high dimensional. Relevant settings include domain-invariant learning, fairness, and causal learning. The procedure requires just a single ridge regression from $Y$ to kernelized features of $Z$, which can be done in advance. It is then only necessary to enforce independence of $\varphi(X)$ from residuals of this regression, which is possible with attractive estimation properties and consistency guarantees. By contrast, earlier measures of conditional feature dependence require multiple regressions for each step of feature learning, resulting in more severe bias and variance, and greater computational cost. When sufficiently rich features are used, we establish that CIRCE is zero if and only if $\varphi(X) \perp !!! \perp Z \mid Y$. In experiments, we show superior performance to previous methods on challenging benchmarks, including learning conditionally invariant image features.
- Generalization through the lens of leave-one-out error. In ICLR, 2022.
- Putting fairness principles into practice: Challenges, metrics, and improvements. In AIES, 2019.
- JJ Daudin. Partial association measures and an application to qualitative regression. Biometrika, 67(3):581–590, 1980.
- ImageNet: A large-scale hierarchical image database. In CVPR, pp. 248–255, 2009.
- Sobolev norm learning rates for regularized least-squares algorithms. JMLR, 21:205–1, 2020.
- Kernel measures of conditional dependence. In NeurIPS, 2008.
- From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE T-PAMI, 23(6):643–660, 2001.
- Model patching: Closing the subgroup performance gap with data augmentation. In ICLR, 2021.
- A. Gretton. Introduction to RKHS, and some simple kernel algorithms. Lecture Notes, Gatsby Computational Neuroscience Unit, 2022. URL http://www.gatsby.ucl.ac.uk/~gretton/coursefiles/rkhscourse.html.
- Kernel methods for measuring independence. JMLR, 6:2075–2129, 2005a.
- Measuring statistical dependence with Hilbert-Schmidt norms. In ALT, pp. 63–77, 2005b.
- Conditional mean embeddings as regressors. In ICML, 2012.
- Deep residual learning for image recognition. CVPR, pp. 770–778, 2016.
- Kernel partial correlation coefficient — a measure of conditional dependence. JMLR, 23(216):1–58, 2022.
- Invariant and transportable representations for anti-causal domain shifts, 2022.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- A rigorous theory of conditional mean embeddings. SIAM Journal on Mathematics of Data Science, 2(3):583–606, 2020.
- Optimal rates for regularized conditional mean embedding learning, 2023.
- Conditional adversarial domain adaptation. In NeurIPS, volume 31, 2018.
- Decoupled weight decay regularization. In ICLR, 2019.
- Causally motivated shortcut removal using auxiliary labels. In AISTATS, 2022.
- dsprites: Disentanglement testing sprites dataset, 2017. URL https://github.com/deepmind/dsprites-dataset/.
- Colin McDiarmid. On the method of bounded differences. Surveys in combinatorics, 141(1):148–188, 1989.
- A survey on bias and fairness in machine learning. ACM Comput. Surv., 54(6), 2021.
- M. Mollenhauer and P. Koltai. Nonparametric approximation of conditional expectation operators. arXiv preprint arXiv:2012.12917, 2020.
- A measure-theoretic approach to kernel conditional mean embeddings. In NeurIPS, 2020.
- Out-of-distribution generalization in the presence of nuisance-induced spurious correlations. In ICLR, 2022.
- Learning counterfactually invariant predictors. arXiv preprint arXiv:2207.09768, 2022.
- Random features for large-scale kernel machines. In NeurIPS, 2007.
- Equivalence of distance-based and rkhs-based statistics in hypothesis testing. Annals of Statistics, 41(5):2263–2702, 2013.
- The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics, 48(3):1514–1538, 2020.
- Hilbert space embeddings of conditional distributions. In ICML, 2009.
- Universality, characteristic kernels and RKHS embedding of measures. JMLR, 12:2389–2410, 2011.
- Support Vector Machines. Information Science and Statistics. Springer, 2008.
- A kernel-based causal learning algorithm. In ICML, pp. 855–862, 2007.
- Domain adaptation with conditional distribution matching and generalized label shift. NeurIPS, 2020.
- Counterfactual invariance to spurious correlations in text classification. In NeurIPS, 2021.
- A unified causal view of domain invariant representation learning. In ICML Workshop on Spurious Correlations, Invariance and Stability, 2022.
- Kernel-based conditional independence test and application in causal discovery. In UAI, 2011.
- Roman Pogodin (11 papers)
- Namrata Deka (5 papers)
- Yazhe Li (17 papers)
- Danica J. Sutherland (49 papers)
- Victor Veitch (38 papers)
- Arthur Gretton (127 papers)