Causal Structure and Representation Learning

Updated 11 November 2025

Causal Structure and Representation Learning is a field that merges statistical causal discovery with nonlinear latent modeling to extract interpretable high-level features.
Techniques like grouping observational variables, interventional supervision, and multi-environment exchangeability tackle the inherent identifiability challenges.
Empirical studies using grouped models show robust latent recovery and graph estimation, demonstrating practical scalability in complex domains like genomics.

Causal structure and representation learning is a research field at the intersection of statistical causal discovery and nonlinear latent variable modeling, seeking to extract high-level, interpretable features from raw data that support meaningful reasoning about causal relationships. Unlike classical representation learning, whose goal is typically to find lower-dimensional or invariant descriptors supporting predictive tasks, causal representation learning (CRL) asks not only for latent features but also for an explicit model of their mutual cause-effect interactions. This union is strongly ill-posed—provable identifiability typically requires additional constraints, supervision, or special statistical structures. Recent developments have addressed practical identifiability conditions in a wide variety of domains, including self-supervised grouping, weak and interventional supervision, diffusion models, and non-i.i.d. multi-environment data, among others.

1. Identifiability Challenges in Causal Representation Learning

The principal obstacle in CRL is identifiability: for observed data $x$ generated via a nonlinear transformation $f$ from unobserved causal latents $s$ , and a causal graph among the $s$ , one rarely has the statistical power to identify both $f$ and the graph from $p(x)$ alone (Morioka et al., 2023). In classical ICA and nonlinear ICA, identifiability is possible only under strong mixing or non-stationarity assumptions. Causal discovery, even given the latents, often suffers from equivalence classes if only observational data are available. In CRL, both problems compound, requiring designed regularization, grouping, or supervision to isolate a unique solution.

Advances characterize conditions under which identifiability can be guaranteed:

Grouping of Observational Variables: By assuming the observed variables can be partitioned into groups, each depending on disjoint subsets of the latent causal vector, the mapping $f$ is block-diagonal; non-Gaussian pairwise potentials (excluding the linear-Gaussian case) and block-wise “connectivity” between latent groups permit identifiability of the demixing transforms up to permutations and invertible scalar transformations (Morioka et al., 2023).
Interventional Supervision: Paired samples $(x, x')$ before and after random, unknown atomic interventions suffice to render both the latent structural causal model (SCM) and the observational mapping identifiable up to rescaling and permutation, assuming each intervention is possible (Brehmer et al., 2022).
Weak or Imperfect Interventions: Soft interventions or support-independence constraints still yield identifiability, often to block-sparse or block-affine indeterminacy (Ahuja et al., 2022).
Multi-environment Exchangeability: When data are collected under non-i.i.d., exchangeable mechanisms, mechanism variability or source variability in different environments can break symmetry, yielding unique recovery of mixing functions and causal graphs (Reizinger et al., 20 Jun 2024).

2. Grouped Observations and the G-CaRL Framework

A central recent advance is identifiability by grouping of observational variables (Morioka et al., 2023). The observed data $x \in \mathbb{R}^D$ are partitioned into $M$ blocks $x^m$ , each being a nonlinear function $f^m$ of a disjoint group of latent causal variables $s^m$ .

Model:

$s=\{s^1,\ldots,s^M\} \in \mathbb{R}^{D_s}$ , with $D_s=\sum_m d_s^m$
$x=[x^1,\ldots,x^M]$ , $x^m=f^m(s^m)$ , $d_x^m$ variable, $\sum_m d_x^m=D$
Each latent group interacts internally and with other groups via Markov pairwise potentials. The inter-group links $\lambda_{ab}^{mm'}$ encode causal relationships.

Grouping assumption: for each $x_i$ the group index $m$ is known, and the mixing $f$ is block-diagonal: no coordinate of $x^m$ depends on any $s^{m'\neq m}$ . Provided each latent variable $s^m_a$ connects to at least one latent in another group, and the pairwise potential $\phi$ is sufficiently non-Gaussian and asymmetric, the invertible $C^2$ maps $f^m$ are uniquely identified (up to coordinate permutation and scalar reparameterization).

Proof sketch: Equate joint log densities under two candidate generative models; cross-differentiate and employ rank arguments to show that the only invertible transformations preserving $p(x)$ are coordinate-wise.

G-CaRL estimation:

Binary classifier distinguishes true grouped samples ( $x^{(n)}$ ) from negatives formed by independently shuffling group blocks across samples ( $\tilde x^{(n)}$ )
The discriminant is parameterized via group-wise encoders with nonlinearities, inter-group pairwise terms, and group feature extractors
Loss: standard logistic regression cross-entropy between positives and negatives
At optimum, the density-ratio argument forces each group feature extractor $h^m$ to invert $f^m$ , yielding identifiability as above.

Consistency theorem: With universal function approximators for $h^m$ and pairwise terms, global minimization yields recovery up to coordinate permutation and scalar reparameterization. The learned inter-group weights estimate the true causal graph up to scale and transpose (subject to mild additional constraints).

Empirical results: Simulations and real single-cell data show G-CaRL with Pearson $r>0.9$ , F $_1>0.8$ for graph recovery, outperforming CausalVAE, two-stage VAE+NOTEARS, and related baselines.

3. Robustness to Confounders, Cycles, and Extensions

Unlike most previous CRL methods that require temporal or interventional supervision, the grouped approach is robust to latent confounders and cycles:

Intra-group confounders do not affect inter-group pairwise terms, so only inter-group causal relationships are learned; the method automatically screens off pure intra-group signals.
The Markov pairwise potential model permits arbitrary directed cycles; acyclicity is not imposed.
Estimation of inter-group graph parameters $\lambda$ is possible via the learned pairwise weights.

Limitations:

Requires known, non-overlapping groups for the observed variables; overlap or partial groups/extensions are a current research direction.
Only inter-group causal structure is identified; inner-group structure is unseen.
The recovered graph is ambiguous up to block-wise scale and transpose, unless further prior knowledge is supplied on $\phi$ .

Potential extensions: overlapping or partially overlapping groups, group-dependent potentials, embedding schemes where $d_x^m>d_s^m$ (injective manifolds), integration of weak supervision or module-wise interventions.

Other frameworks in the literature address complementary identifiability strategies:

Interventional Causal Representation Learning leverages perfect or soft $do$ interventions, which can geometrically align the support of latent variables, yielding disentanglement up to permutation and scaling, or block-affine mixing (Ahuja et al., 2022).
Weakly supervised paired interventions (atomic, unknown targets) achieve identifiability with variational autoencoders whose solution map is invertible and triangular (Brehmer et al., 2022).
Supervised GAN-style models with SCM priors (DEAR) integrate graph-structured latent priors and enforce supervised/unsupervised factor alignment (Shen et al., 2020).
Recent work identifies the duality between source- and mechanism-variability under exchangeable, non-i.i.d. data, showing that only sufficient variability in one mechanism is needed for global identifiability (Reizinger et al., 20 Jun 2024).
Disentangled Causal VAEs (DCVAE) directly parameterize causal graphs via normalizing flows in the encoder, supporting do-interventions and clean factor traversal (Fan et al., 2023).

A common theme is the need to break classical non-identifiability by either grouping, environmental variability, interventional signals, or architectural constraints.

5. Practical Implementation, Scaling, and Domain-Specific Performance

The G-CaRL approach is both theoretically consistent and empirically scalable:

Feature extractors $h^m$ and nonlinearities $\psi$ are implemented as universal neural networks, trained with standard logistic-regression solvers.
Negative samples are constructed by randomly shuffling blocks across the batch, enabling self-supervised training without explicit labels.
The method is robust to cycles and latent confounders, showing strong performance when tested on gene-regulatory datasets and synthetic DAGs, and outperforming two-stage and prior methods in both latent recovery (Pearson r) and graph F1.
Scaling is feasible provided the number of blocks and block sizes are manageable; computational complexity scales with the sum of block sizes and the number of pairwise terms.
Real-world deployment requires careful pre-specification of grouping, and some domain knowledge (e.g., in bioinformatics, grouping by gene modules).

6. Outlook and Theoretical Advances

Grouped observation-based CRL addresses a critical practical bottleneck: how to learn both features and their causal interactions without temporal, interventional, or supervised data (Morioka et al., 2023). The block-diagonal mixing and inter-group Markov potentials provide a statistically efficient, self-supervised path to identifiability.

Future directions include handling overlapping or partial groupings, more expressive model classes, mixed datatypes, and integration with multi-modal signals. The limitations in intra-group identifiability may be overcome by incorporating additional side information or extending the pairwise modeling to intra-block terms. More broadly, the union of architectural (grouping, normalizing flows), statistical (exchangeability, multi-environment), and algorithmic (self-supervised density-ratio learning) principles continues to drive progress in scalable causal structure and representation learning.