Scalability of CALM beyond two models
Determine whether Composition to Augment Language Models (CALM), which merges a base large language model with specialized models via pairwise cross-attention, can scale beyond two constituent models without incurring quadratic complexity in the number of pairwise cross-attention connections.
References
It is also not clear how this construction scales beyond two models, as it may require a quadratic number of pairwise cross-attention connections.
— A Theoretical Framework for Modular Learning of Robust Generative Models
(2602.17554 - Cortes et al., 19 Feb 2026) in Section 2 (Related Work) — Mixtures, Merging, and Composition