Role of trainable decay rates in EMA-based memory
Investigate whether learning element-wise decay rates in an Exponential Moving Average (EMA) memory enables the separation of fast and slow memory features that accounts for the observed performance gains on Mem‑RPE.
Sponsor
References
Making $\lambda$ trainable has a positive impact, which can be explained quite easily: we conjecture that it allows the model to choose whether certain memory features are 'fast' or 'slow'.
— Kinaema: a recurrent sequence model for memory and pose in motion
(2510.20261 - Sariyildiz et al., 23 Oct 2025) in Appendix E.1, EMA w. constant λ vs. trainable λ (Table 8)