Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
117 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Communicating Activations Between Language Model Agents (2501.14082v2)

Published 23 Jan 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Communication between multiple LLM (LM) agents has been shown to scale up the reasoning ability of LMs. While natural language has been the dominant medium for inter-LM communication, it is not obvious this should be the standard: not only does natural language communication incur high inference costs that scale quickly with the number of both agents and messages, but also the decoding process abstracts away too much rich information that could be otherwise accessed from the internal activations. In this work, we propose a simple technique whereby LMs communicate via activations; concretely, we pause an LM $\textit{B}$'s computation at an intermediate layer, combine its current activation with another LM $\textit{A}$'s intermediate activation via some function $\textit{f}$, then pass $\textit{f}$'s output into the next layer of $\textit{B}$ and continue the forward pass till decoding is complete. This approach scales up LMs on new tasks with zero additional parameters and data, and saves a substantial amount of compute over natural language communication. We test our method with various functional forms $\textit{f}$ on two experimental setups--multi-player coordination games and reasoning benchmarks--and find that it achieves up to $27.0\%$ improvement over natural language communication across datasets with $<$$1/4$ the compute, illustrating the superiority and robustness of activations as an alternative "language" for communication between LMs.

Summary

  • The paper proposes language models communicate by sharing intermediate activations, combining A's activation with B's at an intermediate layer to continue processing.
  • Using frozen LMs with zero extra parameters, this activation communication method reduces inference compute by over 75% while matching or exceeding natural language performance.
  • Experiments consistently outperformed natural language methods by up to 27.0% across multiple tasks, demonstrating this approach opens a new, more efficient paradigm for LM communication.

The paper "Communicating Activations Between LLM Agents" proposes an innovative approach for inter-LLM (LM) communication by leveraging activations instead of traditional natural language. The methodology proposed circumvents the computational inefficiencies and data loss associated with natural language communication, such as high inference costs and reduced information retention due to tokenization.

Key Methodology:

  • The approach involves two LLMs, AA and BB. During communication, model BB's computation is paused at an intermediate layer. Its current activation is combined with an activation from model AA using a function ff. The result of this combination then progresses through model BB's next layer, continuing the forward pass until completion.
  • Several models for function ff are explored, including simple operations like sum, mean, and replacement, as well as a learned task-agnostic linear layer.

Utility and Benefits:

  • This framework requires zero additional task-specific parameters or data and employs only existing, frozen LMs, making it highly scalable across different tasks.
  • It substantiates a significant reduction in compute required at inference compared to natural language communication—achieving similar or better performance with less than one-fourth the computational resources.

Experimental Validation:

  • Experiments were conducted across multiple domains and tasks including multi-player coordination games and reasoning benchmarks.
  • The proposed activation communication consistently outperformed natural language approaches (Natural Language Debate) across various datasets by up to 27.0% in performance while maintaining lower computational demands.

Conclusion and Potential Impact:

  • The activation-based communication paradigm offers a more efficient and scalable method for enhancing the reasoning capabilities of LLMs. It suggests a transition from token-based languages to communicative representations that can encapsulate richer information, effectively opening a new avenue for future research in model-to-model communication and collaboration.

Limitations and Future Research Directions:

  • The need for aligned embedding spaces when not using the learned linear layer, potential limitations in interpretability, and the focus on non-black-box model access are noted as points of consideration.
  • Future work could explore deploying this technique in scenarios demanding complex reasoning and negotiation tasks, as well as further developing methods for interpreting and leveraging the insights communicable via activation layers.