Theoretical Limit of Multi‑Layer DeltaNet Expressivity

Determine the theoretical expressivity limit of the DeltaNet architecture when the number of layers is increased while keeping one generalized Householder transformation per token (n_h = 1, equivalently DeltaProduct with n_h = 1), by precisely characterizing the maximal computations or formal languages that can be implemented under this multi-layer configuration.

Background

DeltaNet is a linear RNN whose state-transition matrix at each token is a generalized Householder transformation, enabling token-channel mixing and improved expressivity relative to diagonal linear RNNs. DeltaProduct generalizes DeltaNet by applying multiple gradient steps per token, yielding a product of generalized Householder transformations and tunable expressivity via n_h.

The paper extends prior results by proving that a two-layer DeltaNet can solve dihedral group word problems, showing increased capability with multiple layers. However, despite these advances, the authors explicitly note that the overall theoretical limit of DeltaNet’s expressivity as the number of layers increases remains unknown, motivating a formal characterization of what computations or languages such multi-layer models can implement.

References

In contrast to increasing the number of gradient steps per token, the expressivity of DeltaNet (equivalent to DeltaProduct with $n_h = 1$) can also be enhanced by increasing the number of layers and its theoretical limit is still unknown.

— DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products (2502.10297 - Siems et al., 14 Feb 2025) in Section 4: Two Layer DeltaNet Can Solve Dihedral group Word Problems (first paragraph)

Theoretical Limit of Multi‑Layer DeltaNet Expressivity

Background

References

Related Problems