DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products (2502.10297v6)

Published 14 Feb 2025 in cs.LG, cs.CL, and cs.FL

Abstract: Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling, offering efficient training and linear-time inference. However, existing architectures face a fundamental trade-off between expressivity and efficiency, dictated by the structure of their state-transition matrices. Diagonal matrices, used in models such as Mamba, GLA, or mLSTM, yield fast runtime but have limited expressivity. To address this, recent architectures such as DeltaNet and RWKV-7 adopted a diagonal plus rank-1 structure, which allows simultaneous token and channel mixing, improving associative recall and, as recently shown, state-tracking when allowing negative eigenvalues in the state-transition matrices. Building on the interpretation of DeltaNet's recurrence as performing one step of online gradient descent per token on an associative recall loss, we introduce DeltaProduct, which instead takes multiple ($n_h$) steps per token. This naturally leads to diagonal plus rank-$n_h$ state-transition matrices, formed as products of $n_h$ generalized Householder transformations, providing a tunable mechanism to balance expressivity and efficiency. We provide a detailed theoretical characterization of the state-tracking capability of DeltaProduct in finite precision, showing how it improves by increasing $n_h$. Our extensive experiments demonstrate that DeltaProduct outperforms DeltaNet in both state-tracking and LLMing, while also showing significantly improved length extrapolation capabilities.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/Euclaise_/status/1902439248709443700

https://twitter.com/SonglinYang4/status/1896999201768399302

https://twitter.com/riccardograzzi/status/1933552687544410132

YouTube

Show All Videos

DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products (2502.10297v6)

Summary

Related Papers

Tweets

YouTube