Papers
Topics
Authors
Recent
2000 character limit reached

xLSTM-PINN: Spectral Enhancement for PDE Solvers

Updated 20 November 2025
  • The paper introduces xLSTM-PINN, a spectral remodeling extension to PINNs that leverages memory-gated, multiscale xLSTM blocks to elevate high-frequency learning.
  • It employs a staged frequency curriculum and adaptive residual reweighting to address spectral bias and improve convergence and extrapolation in solving PDEs.
  • Empirical benchmarks on various PDE problems show significant accuracy gains, enhanced generalization, and superior performance compared to conventional PINNs.

xLSTM-PINN is a spectral remodeling extension of physics-informed neural networks (PINNs) designed to mitigate spectral bias, residual-data imbalance, and poor extrapolation in neural PDE solvers. By introducing memory-gated, multiscale feature extraction via xLSTM blocks, coupled with a staged frequency curriculum and adaptive residual reweighting, xLSTM-PINN systematically elevates the neural tangent kernel (NTK) spectrum for high-frequency learning. The method achieves both theoretically justified and empirically significant improvement in accuracy, convergence, and extrapolation on benchmark PDEs, without modifications to the standard physics loss or automatic differentiation routines (Tao et al., 16 Nov 2025).

1. Architecture: xLSTM Blocks and Gated Memory

xLSTM-PINN replaces the generic multilayer perceptron core of conventional PINNs with a stack of xLSTM blocks. Each block is composed of an internal multiscale, memory-gated recursion (“micro-time” steps) and a light, nonlinear feed-forward mixer.

During each of the SS internal micro-steps within a block \ell, the state evolves as follows:

  • Hidden state htRWh_t \in \mathbb{R}^W
  • Memory cell ctRWc_t \in \mathbb{R}^W
  • Duty-cycle scalar ntRWn_t \in \mathbb{R}^W
  • Logarithmic-scale gate accumulator mtRWm_t \in \mathbb{R}^W
  • Evolving block representation utRWu_t \in \mathbb{R}^W

The steps comprise (simplified from Eqs. 3–6):

  1. Compute gates and candidate:

[gi,gf,go,gz]=Wut+Uht+b[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell

it=exp(gi),ft=σ(gf) or exp(gf),ot=σ(go),zt=tanh(gz)i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)

  1. Log-space stabilization and normalized gating:

mt+1=max(logft+mt,logit)m_{t+1} = \max(\log f_t + m_t, \log i_t)

fˉt=exp(clip(logft+mtmt+1)),iˉt=exp(clip(logitmt+1))\bar{f}_t = \exp(\text{clip}(\log f_t + m_t - m_{t+1})), \quad \bar{i}_t = \exp(\text{clip}(\log i_t - m_{t+1}))

  1. Memory state and output update:

ct+1=fˉtct+iˉtztc_{t+1} = \bar{f}_t \odot c_t + \bar{i}_t \odot z_t

nt+1=fˉtnt+iˉtn_{t+1} = \bar{f}_t \odot n_t + \bar{i}_t

ht+1=ot(ct+1/(nt+1+ϵ))h_{t+1} = o_t \odot \left( c_{t+1} / (n_{t+1} + \epsilon) \right)

ut+1=ut+ψ(Pht+1),ψ=tanhu_{t+1} = u_t + \psi(P^\ell h_{t+1}), \quad \psi = \tanh

Each block aggregates features at each micro-step (different “scales”) through a learnable, LSTM-style gating function:

mk(t)=gk(h(t1),x)mk(t1)+(1gk(h(t1),x))Fk(h(t1),x)m_k^{(t)} = g_k(h^{(t-1)}, x) \odot m_k^{(t-1)} + (1 - g_k(h^{(t-1)}, x)) \odot \mathcal{F}_k(h^{(t-1)}, x)

with gk(h,x)=σ(Wgk[h;x]+bgk)g_k(h, x) = \sigma(W_g^k [h; x] + b_g^k). After all SS steps, a weighted aggregation M=k=1Sαkmk(S)M = \sum_{k=1}^S \alpha_k \odot m_k^{(S)} is merged into the layer’s output.

A shallow gated feed-forward mixer then computes

y~=φ2(W2φ1(W1uS)),u+=uS+γ(uS)y~\tilde{y} = \varphi_2(W_2 \varphi_1(W_1 u_S)), \quad u^+ = u_S + \gamma(u_S) \odot \tilde{y}

u(+1)=tanh(W()u++b())u^{(\ell+1)} = \tanh(W^{(\ell)} u^+ + b^{(\ell)})

where γ\gamma is a sigmoid gate and φ1,φ2\varphi_1, \varphi_2 are tanh\tanh activations.

Parameter sharing across micro-steps ensures model depth O(LSLS) with parameter count O(LW2LW^2), matching baseline MLP-based PINNs but with richer representational capacity (Tao et al., 16 Nov 2025).

2. Spectral-Bias Mitigation via Frequency Curriculum and Residual Reweighting

xLSTM-PINN directly addresses the spectral bias inherent to standard PINN training. This is accomplished with two orthogonal scheduling mechanisms:

2.1 Frequency Curriculum:

During early training, the residual loss is softly low-pass filtered:

Lreslow=F1[χkK(t)F[R(u)]]L22L_{\text{res}}^{\text{low}} = \| \mathcal{F}^{-1}[ \chi_{|k| \leq K(t)} \cdot \mathcal{F}[\mathcal{R}(u)] ] \|_{L^2}^2

Here K(t)K(t), the frequency cutoff, smoothly grows to its final value over a curriculum of Tc=10,000T_c=10{,}000 steps, ensuring the network resolves large-scale structure before high-frequency detail.

2.2 Adaptive Residual Reweighting:

Each collocation point’s residual is exponentially reweighted according to its current error:

Lres(θ)=i=1Nwi(t)R(uθ(xi))2L_{\text{res}}(\theta) = \sum_{i=1}^N w_i^{(t)} \|R(u_\theta(x_i))\|^2

wi(t)=exp(αR(uθ(xi)))j=1Nexp(αR(uθ(xj)))w_i^{(t)} = \frac{\exp(\alpha |R(u_\theta(x_i))|)}{\sum_{j=1}^N \exp(\alpha |R(u_\theta(x_j))|)}

with α=8\alpha=8–$12$. This adaptively prioritizes harder, typically higher-frequency regions during gradient descent.

Combined with the xLSTM block’s effect on the empirical NTK—where high-frequency eigenvalues λ(k)\lambda(k) are amplified by (1+α(k))S(1+\alpha(k))^S—these procedures jointly lift the NTK tail and suppress spectral bias (Tao et al., 16 Nov 2025).

3. Optimization Protocols and Hyperparameters

The combined objective is

J(θ)=λrLres+λDLD+λNLN+λICLIC+R(θ)J(\theta) = \lambda_r L_{\text{res}} + \lambda_D L_D + \lambda_N L_N + \lambda_{IC} L_{IC} + R(\theta)

where the λ\lambda parameters balance residual, Dirichlet, Neumann, and initial-condition losses, with R(θ)R(\theta) incorporating L2L_2 or Jacobian regularization as needed.

Empirically validated choices include:

  • Block width W=64W=64, depth L=6L=6, micro-steps S=4S=4
  • \sim30,000 total parameters for parity with baseline PINN
  • Adam optimizer, learning rate 10310^{-3} decaying to 10410^{-4} (cosine schedule)
  • Frequency cutoff schedule K(t)=Kmax(t/Tc)2K(t) = K_{\text{max}} \cdot (t/T_c)^2, Tc=10,000T_c=10{,}000
  • Residual reweight parameter α=8\alpha=8–$12$
  • LayerNorm applied to u+u^+ in each block
  • Training stabilization: freeze xLSTM gates for the first 1,0001{,}000 steps (iti_t, ftf_t, oto_t fixed at $0.5$), gradient clipping at norm $1.0$, early stopping by validation residual

4. Quantitative Benchmarks and Frequency Analysis

xLSTM-PINN and baseline PINN were evaluated under identical sample and parameter budgets (3,000 interior/boundary samples, \sim30k parameters) on four PDE problems:

PDE MSE RMSE MAE MaxAE
1D Advection–Reaction 6.28×10689%6.28 \times 10^{-6} \downarrow 89\% 2.51×10366%2.51 \times 10^{-3} \downarrow 66\% 1.54×10379%1.54 \times 10^{-3} \downarrow 79\% 1.71×10277%1.71 \times 10^{-2} \uparrow 77\%
2D Laplace (mixed BCs) 1.47×10899.98%1.47 \times 10^{-8} \downarrow 99.98\% 1.21×10498.44%1.21 \times 10^{-4} \downarrow 98.44\% 9.90×10598.72%9.90 \times 10^{-5} \downarrow 98.72\% 3.82×10495.66%3.82 \times 10^{-4} \downarrow 95.66\%
Steady Heat in Disk (Robin BC) 9.66×10996.62%9.66 \times 10^{-9} \downarrow 96.62\% 9.83×10581.62%9.83 \times 10^{-5} \downarrow 81.62\% 7.87×10585.04%7.87 \times 10^{-5} \downarrow 85.04\% 4.03×10445.94%4.03 \times 10^{-4} \downarrow 45.94\%
Anisotropic Poisson–Beam (4th order) 1.93×10696.75%1.93 \times 10^{-6} \downarrow 96.75\% 1.39×10381.99%1.39 \times 10^{-3} \downarrow 81.99\% 1.09×10385.79%1.09 \times 10^{-3} \downarrow 85.79\% 5.32×10346.48%5.32 \times 10^{-3} \downarrow 46.48\%

Frequency-domain diagnostics substantiate the claimed suppression of spectral bias:

  • Endpoint error ET(k)E_T(k) (L2L^2 plane wave fit) is lower in high kk; plateau lowered by ×2\sim\times 2
  • Spectral gain G(k)=Ebase(k)/ExLSTM(k)G(k)=E_{\text{base}}(k)/E_{\text{xLSTM}}(k) exceeds 1.5–3.0 for k[10,40]k\in[10,40]
  • Time to error threshold τ(k)\tau(k) shortened by 30–50%
  • Resolvable bandwidth k(ϵ)k^*(\epsilon) up by \sim25%

In field-space, xLSTM-PINN produces sharply localized error, cleaner boundary transitions, and significantly less high-kk contamination (<<10% energy in k>30k>30 vs >>40% for baseline) (Tao et al., 16 Nov 2025).

5. Extrapolation and Generalization

Extrapolation assessments demonstrate superior robustness:

  • On 1D advection, training on t[0,1]t\in[0,1] and prediction on t[1,1.5]t\in[1,1.5] yields <<1% error for xLSTM-PINN up to t=1.4t=1.4, whereas baseline PINN’s error grows exponentially past t=1.1t=1.1.
  • For the 2D Laplace problem with 10% of the boundary data removed (“O-shaped” deficit), xLSTM-PINN reconstructs the missing region with 103\leq 10^{-3} error, while the baseline PINN exhibits substantial error.

Memory-gated micro-step recursions approximate an ODE in feature space, imparting greater robustness to off-manifold or out-of-distribution inputs. The cross-scale memory at each layer further enables data-deficient scales to be reconstructed from related features, smoothing the NTK spectrum and reducing overfitting to the observed spectral envelope (Tao et al., 16 Nov 2025).

6. Implications and Extensions

The xLSTM block is modular and can be integrated into any PINN extension, including Fourier-feature PINNs, conservative or stochastic variants, and multi-fidelity setups, without requiring changes to physics loss functions or optimizers. For time-dependent PDEs, the internal micro-step refinement can be extended to both spatial and temporal resolutions. In inverse or multi-fidelity modeling contexts, memory gating can serve a cross-scale autoencoding role, mediating between low- and high-fidelity surrogates.

Architectural spectral engineering—lifting the NTK tail at the representation level—is shown to be as effective as direct loss reweighting strategies for bias mitigation. A plausible implication is that further advances in PDE generalization could arise from hybrid approaches that combine representation- and loss-level spectral control (Tao et al., 16 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to xLSTM-PINN.