LRURec for Sequential Recommendation

Updated 3 March 2026

The paper introduces a novel linear recurrent architecture that achieves competitive accuracy while drastically reducing training time and hardware overhead.
Linear recurrent units are defined by replacing nonlinear dependencies with a closed-form, parallelizable recurrence using eigen-decomposition for stability.
Behavior-dependent gating and hardware-aware parallel scan strategies enable real-time, scalable recommendations on heterogeneous, long user interaction sequences.

Linear Recurrent Units for Sequential Recommendation (LRURec) designate a class of sequential recommender system architectures that model user behavior as a linear, time-evolving process. By leveraging purely linear recurrence relations and recursive parallelization, LRURec and its modern variants achieve high training efficiency, low-latency inference, and competitive recommendation accuracy compared to self-attention and traditional gated RNN models. Recent advances—particularly behavior-dependent gating and hardware-aware parallel scan strategies—generalize LRURec to achieve scalable performance on heterogeneous interaction histories and maximize hardware utilization for real-world datasets (Yue et al., 2023, Liu et al., 2024).

1. Mathematical Foundations of Linear Recurrent Units

The core operation in LRURec replaces typical non-linear RNN or self-attention modules with a purely linear recurrence: $h_k = A\,h_{k-1} + B\,x_k, \quad y_k = C\,h_k + D\,x_k,$ where $x_k \in \mathbb{R}^{H_\mathrm{in}}$ (input), $h_k \in \mathbb{R}^H$ (hidden state), and $A, B, C, D$ are learnable matrices. To optimize computational efficiency, $A$ is diagonalized as $A = P \Lambda P^{-1}$ with $\Lambda = \mathrm{diag}(\lambda_1, \ldots, \lambda_H)$ , and the dynamics are computed in the transformed basis: $\bar h_k = \Lambda \bar h_{k-1} + \bar B x_k, \quad y_k = \Re(\bar C\,\bar h_k) + D x_k,$ with element-wise constraints on eigenvalues to guarantee recurrent stability ( $|\lambda_i| < 1$ ). This structure enables closed-form prefix computations and recursion-based parallelization for both training and inference (Yue et al., 2023).

2. Recursive Parallelization and Hardware Acceleration

A principal advantage of LRURec is the recursive parallelization achievable via the scan (prefix sum) operator. The hidden state after $t$ steps can be decomposed as: $h_t = \sum_{i=1}^t \Lambda^{t-i} B x_i.$ Using associative scan identities, the sequence is partitioned and computed hierarchically in $\mathcal{O}(\log L)$ time depth, amenable to GPU acceleration. Each round utilizes batched kernels; after left-padding the sequence length to a power of two for maximal hardware utilization, this approach reduces wall-clock training time by an order of magnitude relative to serial recurrence (Yue et al., 2023, Liu et al., 2024).

3. Architectural Enhancements and Nonlinear Augmentation

To circumvent the limited expressivity of purely linear dynamics, LRURec integrates nonlinearity through transformer-inspired modules:

Layer Normalization:

$z_k = \mathrm{LayerNorm}(h_k)$

Position-wise Feed-Forward Network (PFFN):

$\mathrm{PFFN}(z) = \mathrm{GELU}(W^{(2)}\,\mathrm{GELU}(W^{(1)}z + b^{(1)}) + b^{(2)})$

Residual Learning: Post-PFFN output is merged via residual connections and re-normalization.

These enhancements are stacked across multiple blocks (typically $N=2$ ), substantially improving training dynamics and model capacity with minimal impact on computational or memory complexity (Yue et al., 2023).

4. Behavior-Dependent Linear Recurrent Units: The RecBLR Model

RecBLR introduces the Behavior-Dependent LRU (BD-LRU), advancing the static LRU design by:

Replacing fixed dynamics with input-dependent gating:

$h_t = \alpha_t \odot h_{t-1} + \beta_t \odot x_t$

where $\alpha_t, \beta_t \in [0,1]^d$ , with per-dimension gates computed from the current input $x_t$ .

Gating Mechanism:

$r_t = \sigma(W_r x_t + b_r), \quad i_t = \sigma(W_i x_t + b_i)$

Gate-to-Scale Mapping: Per-dimension rates $\Lambda$ (softplus-parametrized, stabilized) yield

$\alpha_t = \exp(-\mathrm{softplus}(\Lambda) \odot r_t ), \quad \beta_t = \sqrt{1-\alpha_t^2} \odot i_t$

This design ensures dynamic memory/input blending and numerical robustness, replacing overparameterized complex-valued dynamics with streamlined, real-valued, input-responsive recurrence (Liu et al., 2024).

RecBLR's architecture includes:

Embedding initialization ( $E \in\mathbb{R}^{|V|\times D}$ ),
Multi-layer behavior modeling with BD-LRU and causal convolutions,
Dropout, residual, and LayerNorm throughout,
Final softmax scoring over the item vocabulary.

5. Complexity, Scalability, and Inference Properties

LRURec and RecBLR offer theoretical and empirical advantages in time and space complexity compared to conventional RNN and Transformer-based recommenders:

Model	Training Time (per user)	Inference Time	Memory Footprint
LRURec/RecBLR	$\mathcal{O}(\log T \cdot D)$	$\mathcal{O}(T D)$	$\mathcal{O}(T D + T)$
RNN	$\mathcal{O}(T D^2)$	$\mathcal{O}(T D)$	$\mathcal{O}(D)$
Transformer	$\mathcal{O}(T^2 D + T D^2)$	$\mathcal{O}(T^2 D)$	$\mathcal{O}(T^2 + T D)$

RecBLR employs hardware-aware padding and parallel scan acceleration, minimizing compute overhead and leveraging modern GPU kernels (Triton/CUDA) for up/down tree-sweep operations on hidden states. Embedding-only padding circumvents memory blow-up by restricting power-of-two adjustment to BD-LRU entry-points, followed by truncation (Liu et al., 2024).

6. Empirical Results and Ablation Findings

LRURec and RecBLR have been validated on public benchmarks including ML-1M, Amazon, Steam, Gowalla, and XLong (Alibaba) (Yue et al., 2023, Liu et al., 2024). Key results include:

On ML-1M, RecBLR achieves HR@10=0.3285 and NDCG@10=0.1901, surpassing the best baseline LRURec by 7.5% and 7.3% relative, respectively.
Across five datasets, RecBLR yields 1.7–9.1% (HR) and 1.3–9.0% (NDCG) relative improvement over competitive methods (FPMC, Caser, GRU4Rec, SASRec, BERT4Rec, FMLP-Rec, LRURec).
Training time per epoch on long-range datasets (XLong, avg len ≈800): RecBLR (parallel) 263 s vs. LRURec (serial) 595 s, SASRec (quadratic) OOM at T~800.
In batched online inference, LRURec attains over 7x throughput advantage relative to SASRec at typical sequence lengths.

Ablation studies demonstrate:

Single recurrent layer induces ≈5% HR@10 reduction.
Omitting either BD-LRU gating structures or temporal convolution degrades HR/NDCG by 2–8%.
Larger dropout (0.4–0.5) is beneficial for high-sparsity datasets.

7. Practical Considerations and Deployment

LRURec and RecBLR bridge the "impossible triangle" for sequential recommendation: simultaneous high accuracy, training efficiency, and low-latency inference (Yue et al., 2023, Liu et al., 2024). The closed-form and prefix-scan design enables real-time streaming recommendations, while the low memory profile and decoupling from quadratic sequence dependencies support industrial-scale deployment. Custom initialization (eigenvalue rates) and implementation nuances (left-padding, kernel parallelism, embedding truncation) are essential for stability and efficiency.

Empirical evidence confirms that LRURec and RecBLR robustly scale to long interaction sequences and sparser data domains, with parameter and architectural choices generalizing well across real-world e-commerce, social, and entertainment datasets. A plausible implication is broad applicability for session-based and history-intensive recommender systems that demand stringent latency and hardware constraints.

Markdown Report Issue Upgrade to Chat

References (2)

Linear Recurrent Units for Sequential Recommendation (2023)

Behavior-Dependent Linear Recurrent Units for Efficient Sequential Recommendation (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Linear Recurrent Units for Sequential Recommendation (LRURec).

LRURec for Sequential Recommendation

1. Mathematical Foundations of Linear Recurrent Units

2. Recursive Parallelization and Hardware Acceleration

3. Architectural Enhancements and Nonlinear Augmentation

4. Behavior-Dependent Linear Recurrent Units: The RecBLR Model

5. Complexity, Scalability, and Inference Properties

6. Empirical Results and Ablation Findings

7. Practical Considerations and Deployment

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

LRURec for Sequential Recommendation

1. Mathematical Foundations of Linear Recurrent Units

2. Recursive Parallelization and Hardware Acceleration

3. Architectural Enhancements and Nonlinear Augmentation

4. Behavior-Dependent Linear Recurrent Units: The RecBLR Model

5. Complexity, Scalability, and Inference Properties

6. Empirical Results and Ablation Findings

7. Practical Considerations and Deployment

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research