Root cause of robustness gap between prefix‑tuning and LoRA parameterizations

Investigate and identify the architectural or optimization factors responsible for the observed robustness gap in out‑of‑domain performance between Cartridge parameterizations using simplified prefix‑tuning (trainable KV‑cache tokens) and those using LoRA (low‑rank adapters), and determine whether differences such as activation functions explain this gap.

Background

Empirically, the paper finds that Cartridges parameterized as a trainable KV cache via simplified prefix‑tuning maintain out‑of‑domain performance much better than LoRA at comparable memory budgets, while also performing better on in‑domain tasks. Despite structural similarities between the parameterizations, the mechanism underlying this difference is unknown.

The authors speculate that activation functions (e.g., SiLU vs. Softmax) might play a role, and explicitly leave a more detailed investigation of the root cause for future work.

References

It isn't clear why prefix-tuning is so much more robust than LoRA to out-of-domain performance degradation. It is surprising given the similarity between a KV-cache and an MLP -- both are linear transformations separated by a non-linearity. It is possible that this is due to the difference in the activation function (SiLU vs. Softmax). We leave a more detailed investigation into the root cause of this difference for future work.

— Cartridges: Lightweight and general-purpose long context representations via self-study (2506.06266 - Eyuboglu et al., 6 Jun 2025) in Section 5.4, Ablating Self-Study design choices (Cartridge Parameterization)

Root cause of robustness gap between prefix‑tuning and LoRA parameterizations

Background

References

Related Problems