Principled selection of a unique causal inner product

Develop a principled criterion that uniquely selects the diagonal matrix D>0 in the characterization of causal inner products on the unembedding-difference space \bar{Γ} of a language model, where the class of admissible inner products is defined by positive definite matrices M satisfying M^{-1}=GG^{\top} and G^{\top}Cov(γ)^{-1}G=D for a basis G of canonical unembedding representations of d mutually causally separable concepts and Cov(γ) the covariance of unembedding vectors over the vocabulary; this selection would uniquely determine the causal inner product.

Background

The paper defines a causal inner product on the unembedding-difference space \bar{Γ} such that representations of causally separable concepts are orthogonal. Under Assumption 4.1 and with d mutually causally separable concepts whose canonical unembedding representations form a basis G, Theorem 3.3 shows that any causal inner product can be represented by a positive definite matrix M with M^{{-1}=GG^{\top}} and G^{{\top}Cov(γ)^{-1}G=D} for some positive diagonal matrix D.

This characterization implies a d-parameter family of causal inner products, parameterized by D, and the authors adopt D=I_d for experiments. However, they explicitly note the absence of a principle to select a unique D, leaving the choice of a unique causal inner product unresolved.

References

We do not have a principle for picking out a unique choice of D (and thus, a unique inner product).

— The Linear Representation Hypothesis and the Geometry of Large Language Models (2311.03658 - Park et al., 2023) in Subsection "An Explicit Form for Causal Inner Product" (Section 3)

Principled selection of a unique causal inner product

Background

References

Related Problems