Curriculum Learning via PINNs (CLIP)
- The paper introduces a physics-guided curriculum that sequentially trains PINNs from reaction-dominated regimes to full spatiotemporal PDEs, improving inference accuracy.
- It employs an anchored widening transfer to preserve learned reaction dynamics while gradually incorporating diffusion effects for robust parameter and state recovery.
- Empirical assessments show significant MRAE reductions over baseline methods in canonical systems such as λ–ω, Gray–Scott, and Lotka–Volterra RD models.
Curriculum Learning Identification via PINNs (CLIP) is a physics-guided framework for parameter identification and state reconstruction in partially observed reaction–diffusion (RD) systems. CLIP leverages the physical separability inherent in RD models by structuring neural network training as a curriculum—progressing from reaction-dominated regimes to the full spatiotemporal PDE, and utilizing an anchored widening transfer strategy to enhance convergence and robustness. The method is implemented using physics-informed neural networks (PINNs) and achieves substantial accuracy improvements over baseline techniques in canonical and high-dimensional biological applications (Zhou et al., 24 Jan 2026).
1. Reaction–Diffusion System Identification: Formulation
For a system of reaction–diffusion components on a domain over , the governing PDE is
where denotes the state vector, the unknown diffusion coefficients, and the nonlinear reaction term parameterized by rates .
Identification is complicated by partial observation: only a subset of state variables is measured at sensor points ,
where encodes additive noise. The joint recovery task is to infer hidden fields (), as well as the unknown parameters .
2. Physics-Informed Neural Network Architecture
CLIP employs a PINN to approximate the full state by a neural map
where are trainable parameters. The architecture is defined by:
- A shared trunk multilayer perceptron (MLP), depth 3, width , using smooth activations (e.g., or mixed sine–ReLU functions).
- dedicated branches (1–2 layers each) for mapping trunk features to each output variable.
- An auxiliary MLP surrogate for smoothing observed data to generate robust Laplacian estimates for curriculum masking; this surrogate is only used in the mask calculation, not in PINN losses.
PDE residuals for each are computed via automatic differentiation,
3. Curriculum Learning: Multi-stage Training Workflow
CLIP divides training into three sequential stages:
3.1 Reaction-Dominated Initialization (Stage 0)
A reaction-dominated mask is constructed via Laplacian thresholding on the surrogate-smoothed observed fields. Only points with negligible diffusion (as measured by normalized Laplacian magnitude) are sampled so that the local dynamics are effectively ODE-driven. Stage 0 sets diffusion coefficients to zero (or applies a small PDE-weight) and restricts optimization to reaction kinetics and initialization of hidden state fields.
The loss in Stage 0 is
where indicates masking by .
3.2 Anchored Widening Transfer (Stage 1)
After reaction-only pre-training, diffusion terms are re-introduced. To preserve previously learned reaction dynamics, the network width is enlarged by adding new neurons (anchored widening), and two optimizers are employed: inherited parameters (from Stage 0) are updated with tiny learning rates, while new parameters and diffusion coefficients train with standard rates. This anchoring ensures that coupling dynamics are absorbed by the increased network capacity without destroying the reaction sub-solutions.
3.3 Global Fine-Tuning with Adaptive Sampling (Stage 2)
All network and parameter weights are then jointly fine-tuned at a moderate learning rate. To resolve sharp spatiotemporal features, residual-based adaptive distribution (RAD) sampling augments the training set. New collocation points are selected in proportion to normalized residual magnitudes , improving effective coverage of interface and steep-gradient regions.
4. Training Objective and Loss Functions
CLIP utilizes a composite loss function incorporating data matching, PDE residuals, initial condition enforcement, and (optionally) anchoring:
Definitions:
- Data mismatch:
- Physics residual:
- Initial condition:
- (Optional) Anchor regularization:
Loss weight scheduling ramps across stages to adaptively balance physics and data terms.
5. Optimization Techniques and Hyperparameters
PINN weights are initialized with Xavier normalization. Stage 0 employs Adam with learning rate ; Stage 1 splits inherited and new parameters (lr and , respectively); Stage 2 trains all parameters at . For Min system identification, reaction rates are optimized in log-space and inputs are rescaled by for numerical stability. Activation choices vary: functions for most benchmarks, custom for Gray–Scott to resolve sharper pulses.
6. Empirical Assessment and Benchmark Results
CLIP was evaluated on three canonical RD systems (λ–ω, Gray–Scott, Lotka–Volterra) and a four-variable Min-protein oscillator in bacterial geometry. Only one (or two) components are observed per system, with hidden variables fully unmeasured.
Training uses approximately 2% downsampled points from a high-resolution solver, and noise levels range from 0% to 10%. Representative mean relative absolute error (MRAE) for CLIP and baselines are:
| System | CLIP MRAE (clean) | PINN MRAE | PSO/EnKF |
|---|---|---|---|
| λ–ω | 5.50% | 7.75% | 153%/0.48% |
| Gray–Scott | 9.55% (clean); up to 25.3% (10% noise) | >100% | Failed |
| Lotka–Volterra RD | ≈9.23% (clean) | >2700% | Failed |
| Min-protein oscillator | 18.10% (clean); 23.34% (10% noise) | 72.15% (clean) | — |
For the Min system, CLIP reconstructs unobserved cytosolic fields and membrane-bound time-series matching the amplitude and frequency of ground-truth oscillations.
7. Mechanistic Analysis: Ablation and Loss Landscape
Ablation experiments show incremental improvements with curriculum components:
- Baseline PINN (no curriculum): high MRAE, poor convergence.
- +Reaction-only curriculum: moderate improvement.
- +Anchored widening transfer (full CLIP): order-of-magnitude error reduction.
Visualization of the loss landscape, using PCA trajectories of parameter vectors, reveals that baseline PINNs yield a highly nonconvex terrain with spurious basins trapping the optimizer. The reaction-only stage produces a smoother pathway in parameter space, and the full CLIP scheme leads to a well-conditioned landscape and robust gradient descent toward the optimum.
In sum, CLIP offers a physics-tailored three-stage curriculum—reaction-only initialization, anchored widening transfer, and adaptive fine-tuning—to jointly infer hidden states and unknown parameters in RD systems from sparse, noisy partial observations. This approach makes explicit use of physical modularity and produces significant gains in both trainability and accuracy over conventional PINNs, ensemble Kalman filters, and population-based optimizers (Zhou et al., 24 Jan 2026).