LEACE: Least-squares Concept Erasure

Updated 12 June 2026

LEACE is a closed‐form method that subtracts class means from data representations to completely remove linearly extractable concept information.
It computes per-class and global means to establish an affine transformation that minimizes distortion while ensuring no linear classifier can recover the erased concept.
LEACE is applied in both vision and language models to boost fairness and interpretability by limiting linear predictability of sensitive attributes.

LEAst-squares Concept Erasure (LEACE) is a closed-form method for removing all linearly extractable information about specified concepts from vector representations while minimally perturbing the data under a chosen quadratic norm. Its primary applications are in fairness, interpretability, and controlled feature suppression in both vision and LLMs. LEACE provides provable guarantees: no linear function of the edited representation can predict the target concept better than a trivial constant predictor, and the distortion induced by the erasure is minimal in a well-defined sense (2502.02820, Belrose et al., 2023, Ravfogel et al., 2022).

1. Mathematical Formulation and Solution

LEACE operates on a data matrix $X \in \mathbb{R}^{n \times d}$ , where each example $x_i$ is associated with a concept label encoded in a one-hot matrix $Z \in \{0,1\}^{n \times k}$ for $k$ classes. The goal is to construct an affine transformation $f$ such that

$X' = f(X) = X - A^T Z + b$

where $A \in \mathbb{R}^{k \times d}$ and $b \in \mathbb{R}^d$ are chosen to minimize the mean squared error between $X'$ and $X$ , subject to the constraint that the cross-covariance between $x_i$ 0 and $x_i$ 1 vanishes:

$x_i$ 2

This constraint enforces that the class-conditional means of $x_i$ 3 are all identical (set to the global mean).

Solving the constrained minimization yields a unique, closed-form solution:

For each class $x_i$ 4,

$x_i$ 5

where $x_i$ 6 is the class- $x_i$ 7 mean, and $x_i$ 8 is the global mean (2502.02820, Belrose et al., 2023).

This may be equivalently expressed as a projection:

$x_i$ 9

where $Z \in \{0,1\}^{n \times k}$ 0 projects onto the label span.

The result is a deterministic transformation, easily implemented and stable, which surgically removes all affine signal about the concept.

2. Algorithmic Implementation and Complexity

The LEACE procedure exploits the one-hot structure of $Z \in \{0,1\}^{n \times k}$ 1 for efficient computation:

Compute per-class means: For each class $Z \in \{0,1\}^{n \times k}$ 2, compute the mean vector $Z \in \{0,1\}^{n \times k}$ 3 over all $Z \in \{0,1\}^{n \times k}$ 4 with $Z \in \{0,1\}^{n \times k}$ 5.
Compute global mean: Calculate $Z \in \{0,1\}^{n \times k}$ 6 as the mean over all examples.
Transform representations: For each example $Z \in \{0,1\}^{n \times k}$ 7, subtract the corresponding $Z \in \{0,1\}^{n \times k}$ 8 and add $Z \in \{0,1\}^{n \times k}$ 9.

Pseudocode Summary

$k$ 6

Complexity: $k$ 0; suitable for datasets with millions of samples and thousands of dimensions.
Numerical notes: Empirical means are estimated in a single pass with possible regularization for rare classes. Generalization to soft labels is immediate via the more general covariance formula (2502.02820).

3. Theoretical Guarantees and Optimality

LEACE guarantees that all linear (first-order, affine) information about the specified concept labels is erased:

Linear prediction impossibility: After transformation, no linear classifier (or more generally, any polynomial classifier reliant solely on linear features) can predict the concept better than chance [Belrose et al., 2023; (2502.02820)].
Minimal distortion: The transformation is the orthogonal projection with respect to the chosen quadratic norm (e.g., Frobenius, Mahalanobis) that most closely preserves the original data.
Universality: LEACE optimizes all quadratic ( $k$ 1) norms simultaneously. In the whitened basis, it projects out the “concept directions” corresponding to the cross-covariance between $k$ 2 and $k$ 3 (Belrose et al., 2023).

4. Practical Applications and Empirical Results

LEACE has been evaluated in both vision and language domains:

Image Data: On CIFAR-10, SVHN, and synthetic datasets, LEACE consistently increases the Minimum Description Length (MDL) and final cross-entropy, indicating a systematically harder task for neural networks after linear feature removal. For example, on CIFAR-10 with a 2-layer ReLU MLP, LEACE increases MDL from $k$ 47.14 to $k$ 58.49 bits (2502.02820).
Language Embeddings: Applied to removing gender from BERT embeddings, LEACE reduces linear probe accuracy for gender to chance with minimal impact on main-task accuracy (drop from 79.3% to 77.3%). Similar effects are observed for mitigating part-of-speech reliance in LLMs (Belrose et al., 2023).
Fairness and Interpretability: Results indicate improved fairness as measured by TPR-GAP and mitigated correlation between main-task performance and protected attributes (Belrose et al., 2023, Ravfogel et al., 2022).

Domain	Task	Effect of LEACE
Vision	CIFAR-10 classification	Slower learning, higher MDL
Language	BERT gender removal	Gender probe → chance, small accuracy drop
Language	POS erasure (LLMs)	Higher perplexity, less leakage

5. Relationship to Adversarial and Quadratic Erasure Methods

LEACE is distinguished from iterative or adversarial erasure protocols (e.g., INLP, adversarial training), as well as recently developed quadratic concept erasure methods:

Adversarial methods: Typically require iterative minimax optimization, suffer from instability or lack closed-form solutions, and provide less interpretability of the erased subspace (Belrose et al., 2023, Ravfogel et al., 2022).
INLP (Iterative Null-space Projection): Projects out a sequence of classifier weight vectors, often requiring higher-dimensional removal and potentially overshooting, with greater semantic damage.
Quadratic erasure (QLEACE): Extends LEACE by aligning class-conditional covariances in addition to means, thus erasing second-order (quadratic) signals. However, QLEACE can inadvertently inject higher-order information (third or higher moments), which expressive models may exploit, sometimes even accelerating learning (“backfiring”). Approximate and gradient-based variants (ALF-QLEACE) address some side-effects, functioning more like data augmentation in certain regimes (2502.02820).

6. Limitations and Practical Considerations

LEACE removes only linear dependencies between the representation and the concept:

Residual higher-order leakage: Nonlinear classifiers, or sufficiently deep/wide networks, may eventually recover information via higher-order moments left intact by LEACE, though much more slowly.
Semantic preservation: Empirical results demonstrate that LEACE typically preserves unrelated semantic structure (e.g., SimLex word similarity), while fully erasing the targeted concept in the linear sense (Belrose et al., 2023).
Numerical stability: In certain low-rank or singular situations, regularization of mean estimation or projection blending (e.g., with SAL projectors) may be required to avoid norm inflation (Belrose et al., 2023).
Irreversibility for nonlinear leakage: There is no practical closed-form for removing nonlinear signal in the general case, absent strong generative assumptions.

7. Extensions, Norm Generality, and Layerwise Use

LEACE generalizes to arbitrary PSD norms, enabling it to optimize for preservation under Mahalanobis or task-specific quadratic metrics. The method is also deployable across every layer of a neural network (“concept scrubbing”), with efficient per-layer estimation and online statistics accumulation (Belrose et al., 2023). For scenarios where concept labels are known at inference, an oracle variant projects out the predicted component per-sample via OLS, corresponding to the Hilbert-space orthogonal projection.

Practical code and layerwise procedures are openly available and require only standard linear algebra operations (whitening, centering, matrix multiplication), allowing drop-in use in mainstream deep learning pipelines (Belrose et al., 2023).

References:

"Slowing Learning by Erasing Simple Features" (2502.02820)
"LEACE: Perfect linear concept erasure in closed form" (Belrose et al., 2023)
"Linear Adversarial Concept Erasure" (Ravfogel et al., 2022)

Markdown Report Issue Upgrade to Chat

References (3)

Slowing Learning by Erasing Simple Features (2025)

LEACE: Perfect linear concept erasure in closed form (2023)

Linear Adversarial Concept Erasure (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LEAst-squares Concept Erasure (LEACE).