Root cause of robustness gap between prefix‑tuning and LoRA parameterizations
Investigate and identify the architectural or optimization factors responsible for the observed robustness gap in out‑of‑domain performance between Cartridge parameterizations using simplified prefix‑tuning (trainable KV‑cache tokens) and those using LoRA (low‑rank adapters), and determine whether differences such as activation functions explain this gap.
References
It isn't clear why prefix-tuning is so much more robust than LoRA to out-of-domain performance degradation. It is surprising given the similarity between a KV-cache and an MLP -- both are linear transformations separated by a non-linearity. It is possible that this is due to the difference in the activation function (SiLU vs. Softmax). We leave a more detailed investigation into the root cause of this difference for future work.