Modeling feature covariances in synthetic SAE evaluations

Determine a principled method to model realistic covariances among feature activations in synthetic datasets used to evaluate Sparse Autoencoders, avoiding arbitrary assumptions while reflecting dependencies observed in real neural networks.

Background

The paper evaluates Sparse Autoencoders (SAEs) using a synthetic setup that assumes independent feature activations to provide controlled conditions with known ground-truth features. While SAEs already fail to recover most ground-truth features under this simplified assumption, the authors note that real neural networks likely exhibit correlated feature activations.

The authors acknowledge that incorporating realistic dependencies could, in principle, affect evaluation outcomes, but they highlight the difficulty of introducing covariances without arbitrary modeling choices. This identifies a concrete methodological gap in constructing synthetic datasets that faithfully reflect real-world feature dependencies.

References

Moreover, it remains unclear how to appropriately model these covariances in a synthetic setup without making arbitrary assumptions.

Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?  (2602.14111 - Korznikov et al., 15 Feb 2026) in Section 6 (Limitations)