xLSTM-PINN: Spectral Enhancement for PDE Solvers
- The paper introduces xLSTM-PINN, a spectral remodeling extension to PINNs that leverages memory-gated, multiscale xLSTM blocks to elevate high-frequency learning.
- It employs a staged frequency curriculum and adaptive residual reweighting to address spectral bias and improve convergence and extrapolation in solving PDEs.
- Empirical benchmarks on various PDE problems show significant accuracy gains, enhanced generalization, and superior performance compared to conventional PINNs.
xLSTM-PINN is a spectral remodeling extension of physics-informed neural networks (PINNs) designed to mitigate spectral bias, residual-data imbalance, and poor extrapolation in neural PDE solvers. By introducing memory-gated, multiscale feature extraction via xLSTM blocks, coupled with a staged frequency curriculum and adaptive residual reweighting, xLSTM-PINN systematically elevates the neural tangent kernel (NTK) spectrum for high-frequency learning. The method achieves both theoretically justified and empirically significant improvement in accuracy, convergence, and extrapolation on benchmark PDEs, without modifications to the standard physics loss or automatic differentiation routines (Tao et al., 16 Nov 2025).
1. Architecture: xLSTM Blocks and Gated Memory
xLSTM-PINN replaces the generic multilayer perceptron core of conventional PINNs with a stack of xLSTM blocks. Each block is composed of an internal multiscale, memory-gated recursion (“micro-time” steps) and a light, nonlinear feed-forward mixer.
During each of the internal micro-steps within a block , the state evolves as follows:
- Hidden state
- Memory cell
- Duty-cycle scalar
- Logarithmic-scale gate accumulator
- Evolving block representation
The steps comprise (simplified from Eqs. 3–6):
- Compute gates and candidate:
- Log-space stabilization and normalized gating:
- Memory state and output update:
Each block aggregates features at each micro-step (different “scales”) through a learnable, LSTM-style gating function:
with . After all steps, a weighted aggregation is merged into the layer’s output.
A shallow gated feed-forward mixer then computes
where is a sigmoid gate and are activations.
Parameter sharing across micro-steps ensures model depth O() with parameter count O(), matching baseline MLP-based PINNs but with richer representational capacity (Tao et al., 16 Nov 2025).
2. Spectral-Bias Mitigation via Frequency Curriculum and Residual Reweighting
xLSTM-PINN directly addresses the spectral bias inherent to standard PINN training. This is accomplished with two orthogonal scheduling mechanisms:
2.1 Frequency Curriculum:
During early training, the residual loss is softly low-pass filtered:
Here , the frequency cutoff, smoothly grows to its final value over a curriculum of steps, ensuring the network resolves large-scale structure before high-frequency detail.
2.2 Adaptive Residual Reweighting:
Each collocation point’s residual is exponentially reweighted according to its current error:
with –$12$. This adaptively prioritizes harder, typically higher-frequency regions during gradient descent.
Combined with the xLSTM block’s effect on the empirical NTK—where high-frequency eigenvalues are amplified by —these procedures jointly lift the NTK tail and suppress spectral bias (Tao et al., 16 Nov 2025).
3. Optimization Protocols and Hyperparameters
The combined objective is
where the parameters balance residual, Dirichlet, Neumann, and initial-condition losses, with incorporating or Jacobian regularization as needed.
Empirically validated choices include:
- Block width , depth , micro-steps
- 30,000 total parameters for parity with baseline PINN
- Adam optimizer, learning rate decaying to (cosine schedule)
- Frequency cutoff schedule ,
- Residual reweight parameter –$12$
- LayerNorm applied to in each block
- Training stabilization: freeze xLSTM gates for the first steps (, , fixed at $0.5$), gradient clipping at norm $1.0$, early stopping by validation residual
4. Quantitative Benchmarks and Frequency Analysis
xLSTM-PINN and baseline PINN were evaluated under identical sample and parameter budgets (3,000 interior/boundary samples, 30k parameters) on four PDE problems:
| PDE | MSE | RMSE | MAE | MaxAE |
|---|---|---|---|---|
| 1D Advection–Reaction | ||||
| 2D Laplace (mixed BCs) | ||||
| Steady Heat in Disk (Robin BC) | ||||
| Anisotropic Poisson–Beam (4th order) |
Frequency-domain diagnostics substantiate the claimed suppression of spectral bias:
- Endpoint error ( plane wave fit) is lower in high ; plateau lowered by
- Spectral gain exceeds 1.5–3.0 for
- Time to error threshold shortened by 30–50%
- Resolvable bandwidth up by 25%
In field-space, xLSTM-PINN produces sharply localized error, cleaner boundary transitions, and significantly less high- contamination (10% energy in vs 40% for baseline) (Tao et al., 16 Nov 2025).
5. Extrapolation and Generalization
Extrapolation assessments demonstrate superior robustness:
- On 1D advection, training on and prediction on yields 1% error for xLSTM-PINN up to , whereas baseline PINN’s error grows exponentially past .
- For the 2D Laplace problem with 10% of the boundary data removed (“O-shaped” deficit), xLSTM-PINN reconstructs the missing region with error, while the baseline PINN exhibits substantial error.
Memory-gated micro-step recursions approximate an ODE in feature space, imparting greater robustness to off-manifold or out-of-distribution inputs. The cross-scale memory at each layer further enables data-deficient scales to be reconstructed from related features, smoothing the NTK spectrum and reducing overfitting to the observed spectral envelope (Tao et al., 16 Nov 2025).
6. Implications and Extensions
The xLSTM block is modular and can be integrated into any PINN extension, including Fourier-feature PINNs, conservative or stochastic variants, and multi-fidelity setups, without requiring changes to physics loss functions or optimizers. For time-dependent PDEs, the internal micro-step refinement can be extended to both spatial and temporal resolutions. In inverse or multi-fidelity modeling contexts, memory gating can serve a cross-scale autoencoding role, mediating between low- and high-fidelity surrogates.
Architectural spectral engineering—lifting the NTK tail at the representation level—is shown to be as effective as direct loss reweighting strategies for bias mitigation. A plausible implication is that further advances in PDE generalization could arise from hybrid approaches that combine representation- and loss-level spectral control (Tao et al., 16 Nov 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free