Cache Merging as a Convergent Replicated State for Multi-Agent Latent Reasoning

Published 1 Jul 2026 in cs.MA | (2607.01308v1)

Abstract: Multi-agent latent reasoning composes agents' KV-caches into one context for a final agent. Prior work (Agent Primitives) does this by concatenating caches along the sequence axis with RoPE re-encoding, which we call BagMerge. BagMerge is non-commutative, and the best input ordering is unpredictable, shifting with the regime, the latent-step budget, and the model scale. We make this exchange a convergent replicated state. First, CanonicalMerge fixes the layout by content: ordering caches by mean K-norm at a middle layer renders the merged cache byte-identical under any input permutation, verified algorithmically (arity N<=5) and bit-for-bit on real Qwen3-1.7B and 4B state. Second, we separate the replicated state from decode-time layout: the state is a set of content-addressed latent fragments whose merge is set union, a state-based CvRDT (commutative, associative, idempotent, absorbing), and CanonicalMerge is its deterministic render. Because the render is byte-equivalent, every N=2 accuracy number carries over unchanged and re-delivered duplicates are absorbed rather than re-concatenated. On a partitioned-reasoning benchmark, CanonicalMerge matches the best BagMerge ordering in every regime-by-budget-by-ordering cell without knowing which order is best, trading a small, statistically insignificant accuracy margin for an unconditional structural guarantee. The behaviour transfers to real multi-document QA (HotpotQA), while the closest training-free output-fusion baseline (PackLLM) loses by 45 points at matched budget, placing cache-level merging in a regime distinct from output-level fusion. Finally, at k>2 the approach transports and colocates latent traces but does not by itself compose them, which we characterize to motivate future work.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces a deterministic CanonicalMerge operator that uses content sorting to achieve commutativity and idempotence in multi-agent latent reasoning.
It demonstrates that CanonicalMerge matches or outperforms BagMerge across varying deployment regimes while maintaining bit-identical outputs.
The work establishes a CvRDT abstraction for cache merging, enabling decentralized, federated, and training-free integration of latent fragments.

Cache Merging as Convergent Replicated State for Multi-Agent Latent Reasoning

Problem Formulation and Motivation

This work rigorously analyzes the challenge of cache state merging in multi-agent latent reasoning for transformer-based LLMs. The problem arises in scenarios where multiple agents operate independently, each generating a local KV-cache based on different fragments of a reasoning task. The caches must then be composed into a single context for a final judger agent to decode the answer. While recent frameworks, such as Agent Primitives, employ a cache concatenation operator (denoted BagMerge) along the sequence axis with RoPE re-encoding, this approach is non-commutative: the ordering of caches materially affects post-merge behavior, leading to unpredictable and regime-dependent accuracy variations. More critically, existing approaches do not naturally support federated or decentralized deployment regimes, where fragment order and delivery schedule cannot be prescriptively coordinated, and must tolerate duplicate transmissions.

The paper identifies a clear abstraction gap: current latent cache merging is load-bearing for multi-agent latent reasoning but lacks key properties—especially commutativity, idempotence, and duplicate absorption—that are essential for robust, coordinator-free, distributed systems.

CanonicalMerge and the CvRDT Abstraction

The authors propose CanonicalMerge, a content-determined, byte-stable cache merging operator. The method sorts per-agent caches via a content function (the mean K-norm at a calibrated transformer layer) before applying concatenation with RoPE re-encoding. This results in a deterministic, content-canonical layout that is bit-identical under any input permutation. The permutation is determined for each instance by ranking the sub-caches according to their calculated K-norms, breaking ties with a content hash. This mechanism is empirically verified algorithmically up to $N=5$ on synthetic caches and bit-for-bit on the full 28/36-layer KV state of Qwen3-1.7B/4B.

The method is then lifted from an operator to a replicated-state abstraction: the durable state is a set of content-addressed, unshifted latent fragments (LatentFragmentSet), supporting merge by set union. This yields a state-based CvRDT whose render is CanonicalMerge. The separation between state and render ensures that re-injecting a duplicate cache does not alter the bytewise render, effectively absorbing duplicates, guaranteeing commutativity, associativity, idempotence, and absorption.

Benchmark and Empirical Findings

A partitioned-reasoning benchmark is introduced, where each problem is decomposed into minimal fragments distributed to separate thinkers, and integration is only possible through joint cache merging. Structural and statistical rigor is maintained: the single-fragment floor is $5\%$ , the full-context ceiling is $100\%$ , and all evaluated merges operate within this interval.

Across a comprehensive 12-cell regime–budget–ordering matrix, the paper shows that:

BagMerge's best ordering for cache concatenation is unpredictable and shifts with latent-step budget, deployment regime (query-known versus query-blind), and model scale. The default (lexical) and swap ordering (permuted) may diverge by up to $14$ percentage points.
CanonicalMerge systematically lands within $4$ percentage points of the best possible BagMerge ordering for each tested setting and is statistically indistinguishable from the best ordering under multi-seed, paired-bootstrap, Holm-corrected analysis.
The commutativity and byte-identity of CanonicalMerge are verified both synthetically and on real model state, with the rendered cache being bit-identical for any reordering.
In HotpotQA multi-hop QA, CanonicalMerge achieves no detectable degradation compared to a single-agent, full-context baseline.
Against output-fusion (PackLLM), CanonicalMerge delivers $45$ point improvements at matched budget; PackLLM fails due to the lossy nature of output-space aggregation, demonstrating that cache-level merging is fundamentally distinct (and superior) in such tasks.

The canonical merge exposes strong algebraic and operational guarantees: cache merging in this setting is now stateless, repeatable, coinvariant, and resistant to re-delivery-induced degradation.

Limitations and Theoretical Boundaries

Despite robust two-agent performance, the method does not recover latent composition at $k>2$ . Experiments with three jointly-necessary fragments indicate that cache merging solely colocates information but does not, by itself, induce the required composition; only when one agent reprocesses another's latent state do recovery and composition emerge. This limitation mirrors observations in the recent literature (e.g., (Liu et al., 12 Jun 2026, Oomerjee et al., 22 May 2025))—transport and colocation differ fundamentally from semantic composition in large model latent spaces.

Another limitation is that byte-identity and duplicate-absorption are strictly syntactic: near-duplicate but non-identical caches are distinct elements, and cross-hardware determinism is not guaranteed due to possible arithmetic non-associativity. Additionally, CanonicalMerge's content function is query-conditional and may not always be maximally discriminative, especially in query-blind/federated regimes with minimal latent budgets.

Practical and Theoretical Implications

The adoption of a set-union CvRDT abstraction enables robust, decentralized, federated, and asynchronous multi-agent latent reasoning pipelines—decentralized data sources can contribute pre-encoded KV caches in any order, under any delivery/retry pattern, without coordination or cache bloat. This property is critical for real-world systems where delivery guarantees, duplication, and order cannot be prescriptively orchestrated. Furthermore, cache-level merging—distinct from both parameter fusion and output-level fusion—emerges as an independent operational regime, filling a conceptual and algorithmic gap not addressed by prior literature.

On the theoretical front, the paper delineates a precise boundary: merge-by-colocation is not equal to merge-by-composition, and solving the general $k>2$ composition problem likely requires model-side modifications or learned mechanisms beyond the scope of training-free, cache-only merges. This distinction, explicitly characterized by the failure of parallel or tree-structured merges without additional model reprocessing, motivates future research directions toward robust, order-independent, and compositional cache merging.

Conclusion

This paper establishes, through both algorithmic construction and empirical analysis, a method for cache merging in multi-agent reasoning that is commutative, idempotent, byte-stable, and duplicate-absorbing—canonical properties of robust replicated state. The CanonicalMerge operator, together with the LatentFragmentSet abstraction, provides training-free, structurally sound cache merging that operates distinctly from output-fusion and closes a critical gap facing scalable, decentralized, and federated latent reasoning systems. The method achieves parity with the best-tuned prior approaches while eliminating configuration hazards altogether. The paper's boundary analysis at $k>2$ delineates the fundamental limits of cache-level merging and places the challenge of semantic composition front and center for future work.