Role of trainable decay rates in EMA-based memory
Investigate whether learning element-wise decay rates in an Exponential Moving Average (EMA) memory enables the separation of fast and slow memory features that accounts for the observed performance gains on Mem‑RPE.
References
Making $\lambda$ trainable has a positive impact, which can be explained quite easily: we conjecture that it allows the model to choose whether certain memory features are 'fast' or 'slow'.
— Kinaema: a recurrent sequence model for memory and pose in motion
(2510.20261 - Sariyildiz et al., 23 Oct 2025) in Appendix E.1, EMA w. constant λ vs. trainable λ (Table 8)