Effect of Coupled Query–Key Dynamics at Billion-Parameter Scale
Determine whether incorporating coupled query–key dynamics—jointly evolving query and key representations through shared learned dynamics before scaled dot-product attention scoring—improves or degrades performance when applied to transformer language models at billion-parameter scale, beyond the 60M–350M parameter range evaluated in this study.
References
Our evaluation spans 60M--350M parameters; whether coupled dynamics helps or hurts at billion-parameter scale remains open.
— Coupled Query-Key Dynamics for Attention
(2604.01683 - Gahtan et al., 2 Apr 2026) in Conclusion, Limitations