Papers
Topics
Authors
Recent
2000 character limit reached

Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning

Published 12 Jun 2025 in cs.LG, cs.AI, cs.CL, and stat.ML | (2506.10378v1)

Abstract: Faithful evaluation of LLM capabilities is crucial for deriving actionable insights that can inform model development. However, rigorous causal evaluations in this domain face significant methodological challenges, including complex confounding effects and prohibitive computational costs associated with extensive retraining. To tackle these challenges, we propose a causal representation learning framework wherein observed benchmark performance is modeled as a linear transformation of a few latent capability factors. Crucially, these latent factors are identified as causally interrelated after appropriately controlling for the base model as a common confounder. Applying this approach to a comprehensive dataset encompassing over 1500 models evaluated across six benchmarks from the Open LLM Leaderboard, we identify a concise three-node linear causal structure that reliably explains the observed performance variations. Further interpretation of this causal structure provides substantial scientific insights beyond simple numerical rankings: specifically, we reveal a clear causal direction starting from general problem-solving capabilities, advancing through instruction-following proficiency, and culminating in mathematical reasoning ability. Our results underscore the essential role of carefully controlling base model variations during evaluation, a step critical to accurately uncovering the underlying causal relationships among latent model capabilities.

Summary

  • The paper introduces a causal framework that models benchmark performance as a linear transformation of latent factors, revealing a three-node hierarchical structure.
  • It employs Hierarchical Component Analysis (HCA) to decompose performance data across diverse base models while controlling for confounders.
  • Experimental results on over 1500 models demonstrate distinct latent capabilities that inform effective fine-tuning strategies and benchmark design.

This paper introduces a causal representation learning framework to discover and understand the hierarchical latent capabilities of large LMs. The core challenge addressed is the difficulty in faithfully evaluating LM capabilities due to complex confounding effects (especially from varying base models) and the high cost of retraining for controlled studies. The authors propose that observed benchmark performance can be modeled as a linear transformation of a few causally interrelated latent capability factors, once the base model is appropriately controlled as a common confounder.

The central methodology relies on two key hypotheses:

  1. Capability-Performance Invariance: A small set of latent capability factors consistently governs benchmark performance across diverse base models.
  2. Hierarchical Capability Structure: These capabilities are organized hierarchically (as a Directed Acyclic Graph - DAG) within any individual base model, where an edge A→BA \to B implies that interventions targeting capability AA influence capability BB.

The paper formalizes this using Pearl's structural causal model (SCM) framework, treating the base model as a shared latent parent influencing all capabilities and fine-tuning as an intervention on these latent factors.

To identify these hierarchical latent capabilities, the authors propose Hierarchical Component Analysis (HCA). This algorithm leverages the heterogeneity across different base models (referred to as "domains") to recover the latent structure. The key steps of HCA are:

  1. ICA-based Unmixing: Apply Independent Component Analysis (ICA) separately to the benchmark performance data for each base model (domain). This yields an unmixing matrix MkM_k for each domain kk, which maps observed benchmark data x(k)x^{(k)} to independent source variables ϵ(k)\epsilon^{(k)} such that Mkx(k)=ϵ(k)M_k x^{(k)} = \epsilon^{(k)}. Theoretically, Mk=PkBkHM_k = P_k B_k H, where PkP_k is a permutation, BkB_k is a lower-triangular matrix representing domain-specific causal weights, and HH is a common unmixing matrix related to the shared latent capability structure.
  2. Row-Residual Extraction: The algorithm aims to recover BkB_k and HH. It iteratively identifies rows of HH. For each component ii, it computes the residual of projecting the ii-th row of Mk∗M_k^* (a permuted version of MkM_k) onto the span of its first i−1i-1 rows. If this set of residuals across all domains kk is rank-1, it indicates a component of HH.
  3. Permutation Alignment and Factor Refinement: Since ICA identifies components up to permutation, HCA searches over permutations of the rows of MkM_k. For each permutation, it estimates HH and refines BkB_k by minimizing ∥Mk′−BkH∥F2\|M_k' - B_k H\|_F^2 where Mk′M_k' is the permuted ICA unmixing matrix. The best set of permutations is chosen based on minimizing the Maximum Inexactness Coefficient (MIC), which quantifies how much the recovered source variables deviate from true independence.

The HCA algorithm is applied to data from the Open LLM Leaderboard, encompassing over 1500 models evaluated on six benchmarks. The analysis focuses on models fine-tuned from four base models (Qwen2.5-7B, Qwen2.5-14B, Llama-3-8B, Llama-3.1-8B) that exhibit similar principal component subspaces in their performance data, suggesting a shared underlying structure.

The experiments reveal a concise three-node linear causal structure (z1→z2→z3z_1 \to z_2 \to z_3) that reliably explains observed performance variations, with a low MIC of 0.04. The latent capabilities are interpreted as:

  • z1z_1 (Foundational General Capability): Correlates strongly with general-reasoning benchmarks like MMLU-Pro and BIG-Bench-Hard (BBH). This capability is more influenced by pre-training compute (FLOPs).
  • z2z_2 (Instruction Following): Correlates strongly with IFEval. Interventions like instruction tuning primarily affect this capability, with minimal changes to z1z_1.
  • z3z_3 (Advanced Mathematical Reasoning): Correlates strongly with MATH Lvl 5. This capability is causally influenced by z2z_2, as mathematical tasks often require precise instruction adherence.

Implementation and Application Insights:

  • Controlling for Base Models: The study underscores the critical importance of accounting for the base model when evaluating LLMs. The heterogeneity in performance patterns across different base models necessitates this control for accurate causal discovery.
    • Practical Tip: When evaluating fine-tuning strategies, report results specific to base models or use methods to adjust for base model effects.
  • Matrix Completion: The observation of domain-specific low-rank structures can improve the imputation of missing benchmark scores. Applying matrix completion with nuclear norm regularization within a specific base model's domain yields lower reconstruction error than applying it to the entire leaderboard.
    • Application: More accurately fill in missing data on leaderboards by performing matrix completion locally within model families.
  • HCA Algorithm:
    • Input: Benchmark performance data for multiple models, grouped by their base model (domains).
    • Output: A shared unmixing matrix HH (mapping benchmarks to latent capabilities), domain-specific causal weight matrices BkB_k, and the inferred causal graph among latent capabilities.
    • Steps (Simplified):
    • 1. Perform ICA on each domain's data to get MkM_k.
    • 2. Iterate through permutations of MkM_k's rows.
    • 3. For each permutation, iteratively extract orthogonalized principal components to form HH.
    • 4. Estimate BkB_k for each domain.
    • 5. Calculate MIC; select the permutation yielding the lowest MIC.
    • Computational Cost: The permutation search can be expensive if the number of latent capabilities d0d_0 is large. For d0=3d_0=3 as in the paper, it's manageable.
    • Code Availability: The paper mentions code release, which would be crucial for practical implementation.
  • Interpreting Latent Factors: After identifying latent factors ziz_i, their semantic meaning is established by:

    1. Examining the unmixing matrix HH: Which benchmarks load heavily onto which ziz_i?
    2. Correlating ziz_i values with specific benchmark scores.
    3. Observing the effects of targeted fine-tuning (e.g., IFEval SFT for z2z_2) on benchmark scores and other zjz_j.
  • Fine-tuning Strategy:

    • The discovered hierarchy z1→z2→z3z_1 \to z_2 \to z_3 suggests a development path:
    • Focus on scaling pre-training FLOPs to improve z1z_1 (general capability), as gains here can cascade.
    • For capabilities like z2z_2 (instruction-following), which are less correlated with FLOPs and have higher noise-factor variances, targeted post-training interventions are effective.
    • Improving z2z_2 can subsequently boost z3z_3 (math reasoning).
  • Benchmark Design:
    • Prioritize benchmarks evaluating general, foundational capabilities (z1z_1-aligned) as they reflect more substantive improvements.
    • Be aware that gains on specialized benchmarks (e.g., MATH) might partly stem from improvements in upstream capabilities (e.g., instruction-following).

Limitations and Future Work:

  • The identifiability of HCA relies on assumptions (e.g., linear SCM, sufficient heterogeneity across domains).
  • The MIC provides a quantitative measure of inexactness, but the model is still an approximation.
  • Interpreting and intervening on latent factors remains a general challenge in CRL.
  • The analysis was performed on a specific set of benchmarks and base models; generalizability to other setups needs further investigation.
  • The paper suggests using more advanced causal inference tools (matching, stratification, doubly robust estimation) for future insights.

The paper provides a novel framework for understanding LM capabilities not just as a flat list of scores, but as an interconnected, hierarchical system. This causal perspective offers actionable insights for model development, evaluation, and fine-tuning by revealing how different abilities build upon each other.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.