Theoretical Limit of Multi‑Layer DeltaNet Expressivity
Determine the theoretical expressivity limit of the DeltaNet architecture when the number of layers is increased while keeping one generalized Householder transformation per token (n_h = 1, equivalently DeltaProduct with n_h = 1), by precisely characterizing the maximal computations or formal languages that can be implemented under this multi-layer configuration.
References
In contrast to increasing the number of gradient steps per token, the expressivity of DeltaNet (equivalent to DeltaProduct with $n_h = 1$) can also be enhanced by increasing the number of layers and its theoretical limit is still unknown.
                — DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
                
                (2502.10297 - Siems et al., 14 Feb 2025) in Section 4: Two Layer DeltaNet Can Solve Dihedral group Word Problems (first paragraph)