- The paper presents a feedforward graph that composes multiple frozen LLMs via learned projections into a shared latent space, enabling end-to-end differentiability.
- It achieves significant QA improvements by training only 17.6M parameters atop 12B frozen weights, outperforming the best single models and parameter-matched classifiers.
- The methodology establishes a novel paradigm for integrating complementary LLM representations, with emergent selective routing observed in the output attention layer.
Feedforward Graphs of Frozen LLMs as Differentiable Computation
Overview
"Dead Weights, Live Signals: Feedforward Graphs of Frozen LLMs" (2604.08335) introduces a differentiable architecture that composes multiple heterogeneous frozen LLMs into an end-to-end trainable feedforward graph. The frozen model nodes communicate by projecting internal representations into a shared, trainable latent space, with aggregation, routing, and output prediction mediated by lightweight, learnable linear projections and a cross-attention node. The authors empirically demonstrate that such graphs, despite training only 17.6M parameters (in projection layers and the output node) atop 12B frozen LLM parameters, substantially outperform both the strongest individual constituent models and parameter-matched learned classifiers, across several well-established QA benchmarks. The work establishes tractable gradient flow through multiple frozen LLM boundaries and highlights emergent selective routing in the final output layer.
Motivation and Context
The current practice in LLM research typically involves either scaling monolithic architectures or post hoc ensembling at the output layer. Such approaches overlook the fact that distinct LLMs, trained on different objectives or data, encode complementary competencies within their hidden states. Prior research (e.g., Armstrong et al. 2026) established that independent LLMs exhibit geometric compatibility in their latent spaces, making it feasible to transfer internal activations across architectures via linear projections. This paper leverages that observation, extending it from two-model static steering to general feedforward graphs where multiple frozen LLMs interact via learned projections. The key innovation is using the residual stream as a writable, differentiable communication substrate, enabling direct aggregation and refinement of diverse model knowledge, beyond output-token ensembling.
Architecture and Training Dynamics
The proposed network is structured as a multilayer feedforward graph:
- Layer 1: Heterogeneous Encoders
Three architecturally diverse, frozen small LLMs (Llama-3.2-1B, Qwen2.5-1.5B, Gemma-2-2B) encode the same input, each with a distinct task-oriented framing prefix to elicit factual, reasoning, or linguistic perspectives. Final-token hidden states at deep layers are extracted, L2-normalized, and projected via learned matrices (W1​, W2​, W3​) to a shared latent space of hidden size 1024.
- Shared Latent Aggregation
The projected representations are averaged, enforcing a geometric rather than magnitude-based alignment across sources.
- Layer 2: Injection and Refinement
The shared latent vector is interpolated to the dimensions of two larger frozen LLMs (Phi-3-mini, Mistral-7B) and injected into their residual streams at designated intermediate layers. The injected signal is blended with the native representation using a fixed scalar (α = 0.25) to maintain compatibility. The downstream models produce refined hidden states, which are projected again to the shared latent dimensionality (W4​, W5​).
- Output Node: Cross-Attention Aggregation
The final representations are combined through a multi-head attention mechanism with a trainable query vector and a classifier head, synthesizing output probabilities over the four answer choices.
Crucially, all LLM backbone weights are frozen. Only the five projection matrices and the output node are trained, amounting to 17.6M parameters. The system is trained via standard backpropagation on cross-entropy loss; gradient flow through all graph edges, including frozen model boundaries, is empirically analyzed.
Empirical Results
The architecture is benchmarked on MMLU, ARC-Challenge, and OpenBookQA, demonstrating the following key results:
- Strong Task Performance:
- ARC-Challenge: 87.3% (best single model 75.9%; parameter-matched head 78.2%)
- OpenBookQA: 82.8% (best single model 76.6%; parameter-matched head 77.6%)
- MMLU: 67.2% (best single model 66.0%; parameter-matched head 60.5%)
- Substantial Margins Over Baselines:
The model outperforms the best single constituent by up to 11.4pp and parameter-matched classifiers by up to 9.1pp, establishing that the observed improvements are due to the graph communication mechanism, not simply the learned classifier head.
Gradient measurements confirm that each projection layer receives non-negligible gradients (~13% of the output node signal strength in a two-node ablation), with no collapse. Skip connections and auxiliary losses are found unnecessary at this scale and depth.
- Emergent Selective Routing:
The output attention layer develops a significant, unsupervised preference for the Phi-3-mini path over Mistral-7B, particularly early in training, consistent with the hypothesis that Phi-3-mini's synthetic corpus yields representations more amenable to external steering.
Theoretical and Practical Implications
Theoretical Significance
This work operationalizes the Platonic Representation Hypothesis — that neural networks converge toward a shared, underlying semantic geometry, modulated by architecture-specific coordinate transformations. By validating end-to-end differentiable composition across multiple frozen LLM boundaries (even in the presence of pronounced architectural and training divergence), the study provides concrete evidence supporting the hypothesis's operational viability. Moreover, the viability of deep, multi-frozen-node computation graphs suggests a new paradigm for LLM composition, one that exploits internal representational diversity directly, rather than relying exclusively on token-level outputs.
Practical Utility
The architecture demonstrates competitive-to-excellent performance on high-value QA benchmarks without any fine-tuning or retraining of the underlying LLMs, enabling practical transfer of pretrained knowledge without incurring the cost or risk of catastrophic forgetting. Its efficiency (with a negligible fraction of the parameter count being trained) and architectural modularity make it well-suited for low-resource adaptation and continual composition as the ecosystem of open-weight LLMs expands. The compositional approach also intrinsically supports privacy and verifiability constraints, since constituent models remain immutable during downstream integration.
Limitations and Future Directions
The analysis identifies that layer-1 projection matrices receive near-identical gradients and fail to specialize, likely owing to the hard-averaging operation suppressing differentiation pressure. Introducing learnable attention pooling rather than fixed averaging is a suggested remedy. Additionally, the architecture has so far been validated only for multiple-choice question answering; generalization to open-ended generation and more complex graph topologies remains unexplored. Improvement of scheduler stability, scaling to deeper graphs, and probing for interpretable structure in the shared latent space constitute immediate research opportunities.
Conclusion
This paper presents a practical and empirically robust technique for constructing trainable feedforward graphs of frozen LLMs. The results establish that end-to-end communication through learned projections in a shared latent space enables parameter-efficient aggregation of heterogeneous model knowledge, with strong task performance and tractable optimization. The framework provides a foundation for future research on differentiable systems-of-LLMs, compositional adaptation, and the deeper geometric structure of model representations.