Modeling feature covariances in synthetic SAE evaluations
Determine a principled method to model realistic covariances among feature activations in synthetic datasets used to evaluate Sparse Autoencoders, avoiding arbitrary assumptions while reflecting dependencies observed in real neural networks.
References
Moreover, it remains unclear how to appropriately model these covariances in a synthetic setup without making arbitrary assumptions.
— Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?
(2602.14111 - Korznikov et al., 15 Feb 2026) in Section 6 (Limitations)