Causal Linearization via Perturbation Responses (CLIPR)
- The paper introduces CLIPR, a framework that uses linear response theory and surrogate observables to extract direct causal links in high-dimensional systems.
- It employs matrix linearization and ensemble perturbation trials to estimate susceptibilities with identifiability ensured by full rank conditions.
- The methodology generalizes to nonlinear and chaotic regimes, finding applications in genomics, climatology, neuroscience, and even quantum field theory.
Causal Linearization via Perturbation Responses (CLIPR) is a methodological framework for quantifying and predicting the causal impact of perturbations in dynamical systems, especially where the true forcing is inaccessible and only surrogate observables can be measured. Its formalism encompasses linear response theory, surrogate causality, matrix linearization under drift assumptions, and advanced algorithmic expansions in nonlinear and high-dimensional contexts. CLIPR delivers rigorous tools for extracting direct causal links, ranking the predictive ability of observables, and reconstructing target system responses from partial or indirect data.
1. Linear Response Foundations and Surrogate Causality
The central tenet of CLIPR is rooted in linear response theory, which considers the evolution of a system perturbed by a small forcing . The linear response of an observable at time is given by the susceptibility function
where indexes the spatial-temporal structure of the applied perturbation, and denotes expectation under the perturbed measure. The change in the mean of for a time-modulated forcing is a causal convolution:
This susceptibility can be estimated experimentally via ensemble averages over perturbation trajectories, enabling practical access in high-dimensional and nonlinear systems (Tomasini et al., 2020).
Surrogate causality emerges when the true external forcing is unobserved or ill-defined, necessitating substitution of as proxy input for reconstructing . The causal kernel , derived from the ratio of susceptibilities in frequency space , is used to reconstruct
with strict causality imposed for prognostic validity ().
2. Matrix Linearization and Identifiability
In models such as Latent Causal Diffusion (LCD) for single-cell gene expression, CLIPR is applied to SDEs with perturbative shifts in the drift term:
Assuming a linear drift and perturbations as additive shifts , the stationary mean under perturbation is . The CLIPR estimator for the direct causal effect matrix uses measurements of initial drift responses and limit equilibria for multiple perturbations:
where denotes the Moore–Penrose pseudoinverse, optionally regularized for stability (Lorch et al., 20 Jan 2026). Identifiability is guaranteed if the perturbation shifts span the state space, ensuring is full rank. This framework yields sparse, interpretable causal matrices robust to measurement noise and capable of generalization to unseen perturbations.
3. Algorithmic Protocols and Surrogate Model Construction
Implementing CLIPR in both stochastic nonlinear systems and high-dimensional experimental datasets involves the following procedural steps:
- Test Forcings Selection: Design local and/or global forcings , typically as spatially localized or integrative patterns.
- Susceptibility Estimation: Generate ensemble trials (using impulses or pseudo-perturbations) to empirically estimate for candidate predictors and targets.
- Predictor Selection: Define a set of surrogate observables (local variables, aggregated fields, latent state features).
- Kernel Construction: Compute , invert to obtain , and transform to causal kernels.
- Causality Ranking: Evaluate the Predictability Index (PI) for each kernel or subset:
PI quantifies non-causal leakage; smaller indicates higher surrogate prognostic utility.
- Response Reconstruction: For any new forcing, predict the target response as
Optionally, sparsity-promoting regularization (e.g., ) can be used for optimal predictor subset selection (Tomasini et al., 2020).
For nonlinear or chaotic regimes, CLIPR employs surrogate models—sparse regression (SINDy), nested NNs, or reservoir computing—and simulates virtual perturbations through fitted dynamics to estimate susceptibilities (Chibbaro et al., 9 Sep 2025). Direct perturbation experiments provide higher precision, but virtual approaches remain valid under stationarity and smallness of intervention.
4. Practical Performance and Empirical Findings
Empirical evaluation of CLIPR demonstrates:
- Near-exact causal edge recovery (AUROC ≈ 0.9) in simulated linear systems with sufficient perturbation diversity (Lorch et al., 20 Jan 2026).
- Module clustering and pathway enrichment in Perturb-seq data, revealing gene–gene regulatory structure beyond differential expression analysis.
- Superior disambiguation of direct versus indirect effects, with a substantial increase (lift ≈ 4–5× for top predicted links) in observed downstream DE upon perturbation.
- Robustness to holding out unperturbed genes: CLIPR generalizes causal predictions even to targets not seen in the training set.
- In spatially extended chaotic systems (Lorenz ’96), PI reveals the gradient of causal influence propagation; nearest-neighbor variables serve as the most predictive surrogates with negligible non-causality (R ≈ 0), while distant variables show rapid PI decay. Adding ensemble predictors (local plus global observables) recovers deeply non-local causal forecasting (Tomasini et al., 2020).
Computational cost and sample complexity scale favorably in linear regimes: suffices for ridge estimation, and sparse regression/NN models are well-controlled in moderate dimensions (Chibbaro et al., 9 Sep 2025).
5. Advanced Generalizations and Theoretical Extensions
CLIPR extends to causal variational principles and fragmentation schemes encountered in quantum field theoretic and continuum limits. In causal fermion systems, perturbation theory proceeds by expanding universal measures under weight/diffeomorphism actions, diagonalizing fluctuations across degenerate subspaces, and reconstructing linearized jets via Green’s operators. Convex combinations ("fragmentation") of measures allow simultaneous tracking of subsystem means and fluctuations, enabling perturbative analysis of microscopic mixing and synchronization between fragments (Finster, 2017).
The algorithmic expansion is as follows:
- Choose a critical measure and jet space for the response operator .
- Allow fragmentation into subsystems: $\rhõ = \tfrac{1}{L} \sum_a (F_a)_*(f_a \rho)$.
- Expand local jets, split into mean and fluctuation components.
- Use Green’s operators to solve inhomogeneities iteratively.
- Reconstruct perturbed measures and explore consequences for gauge and gravity theories, as well as particle–field interactions.
6. Limitations, Validity Conditions, and Recommendations
CLIPR’s validity rests on several critical assumptions:
- Stationarity and mixing of the underlying system; detrending and normalization are essential preprocessing steps.
- Smallness of perturbation magnitude () to ensure the linear response approximation.
- Full rank of perturbation-induced responses for identifiability in matrix estimation.
- Applicability of surrogate models is restricted to interpolation within the observed data domain; extrapolation risks bias unless the dynamics are globally stable.
- In chaotic systems, long-time susceptibility estimation incurs exponential variance growth, limiting forecast horizons.
Regularization (ridge, sparsity) is recommended in high-dimensional parameter spaces to mitigate overfitting. Fragmentation analysis requires careful diagonalization on degenerate subspaces and error control beyond leading order.
7. Domain-Specific Applications
CLIPR finds broad utility in fields including:
- Single-cell perturbation genomics: Extraction of gene–gene causal effect matrices, causal module identification, prediction of transcriptome-wide perturbation responses (Lorch et al., 20 Jan 2026).
- Physical and climatological systems: Surrogate-based causal forecasting in spatially extended chaotic models; quantification of propagative signal causality with localized observables (Tomasini et al., 2020).
- Neuroscience and temporal networks: Machine-learning surrogates for causal graph estimation in stochastic nonlinear integration, outperforming Granger causality in regimes with hidden common drivers or strong nonlinearity (Chibbaro et al., 9 Sep 2025).
- Quantum field theory and variational principles: Formal expansions for linearized field equations, fragmentation-based synchronization, and continuum-limit correspondence with classical Dirac–Maxwell perturbation theory (Finster, 2017).
CLIPR thus constitutes a systematic protocol for leveraging partial observations and learned surrogates in the inference of causal dynamics, ranking predictors by their causal informativeness, and achieving reliable prediction and mechanistic insight in complex, high-dimensional systems.