Extent of linear dimensions in generative AI embeddings

Determine the extent to which generative AI systems, including large language models used for social simulation, possess meaningful linear dimensions in their embedding spaces as posited by the linear representation hypothesis, so that steering-vector interventions can be rigorously assessed for feasibility and precision in controlling model behavior or injecting humanlike variation.

Background

The paper proposes steering vectors as a method to directly inject variation into an LLM’s embedding space to improve diversity and potentially control behaviors like sycophancy. This approach assumes that model representations admit approximately linear semantic dimensions that can be manipulated with activation additions.

However, the authors caution that superposition and current interpretability limits create uncertainty about whether such linear dimensions robustly exist across models and contexts. Clarifying the linear representation hypothesis is essential to determine whether steering vectors can reliably match human diversity or control specific behaviors without harmful side effects.

References

Nevertheless, it could be intractable to identify vectors that precisely match real human diversity or specific model behaviors given issues such as superposition and open questions about the extent to which generative AI systems have meaningful linear dimensions in their embeddings (i.e., the “linear representation hypothesis”).

LLM Social Simulations Are a Promising Research Method (2504.02234 - Anthis et al., 3 Apr 2025) in Promising directions, Subsection "Steering vectors"