Convolution-Weighting for PINNs

Updated 30 August 2025

The paper presents a convolution-weighting approach for PINNs that replaces pointwise weights with spatial averaging using SPD kernels in a primal-dual framework.
It leverages local residual averaging to stabilize training in multi-scale and rapidly varying PDE problems, achieving demonstrably lower error norms.
The method’s formulation enhances error representation and robustness, offering practical improvements for inverse problems and high-dimensional applications.

The convolution-weighting method for Physics-Informed Neural Networks (PINNs) is a recently developed approach that enhances the reliability, accuracy, and robustness of PINN solvers by departing from traditional pointwise weighting schemes and introducing spatially (and/or temporally) local averaging or convolutional operators into the adaptive weighting mechanism. By integrating physical continuity, convolution-weighting enables the loss to be more representative of the true error landscape in the context of solving partial differential equations (PDEs). This method is particularly relevant in multi-scale regimes, inverse problems, and areas with rapidly varying solutions, providing demonstrable improvements over previous weighting strategies such as residual-based attention, uncertainty weighting, and inverse-Dirichlet methods.

1. Foundational Principles of Convolution-Weighting in PINNs

Traditional PINN loss constructions compute physics and data residuals at a discrete set of collocation points and apply weights that are typically either manually chosen or adaptively tuned based solely on properties at each point, often resulting in noisy or unstable training, especially when the residual field is locally concentrated or highly variable. Convolution-weighting, as introduced in "Convolution-weighting method for the physics-informed neural network: A Primal-Dual Optimization Perspective" (Si et al., 24 Jun 2025), replaces isolated pointwise weighting by spatially coherent weights derived from the application of convolutional operators (typically SPD kernels, e.g. Gaussian) over neighborhoods of collocation points.

Mathematically, the weight update is tightly coupled to the residual distribution via convolution:

$\bar{r}(x_i) = \frac{1}{M+1}\left(r(x_i) + \sum_{j=1}^M r(x_j)\right)$

where $\mathcal{N}(x_i, \epsilon)$ is a local neighborhood of $x_i$ . The smoother, spatially-aware weight distribution is enforced in the optimization as either

$L(\lambda, \theta) = \sum_i [ \lambda_i r_i(\theta)] - \frac{1}{2} \sum_i \lambda_i^2 + \mathcal{L}_2(\theta)$

or, with a convolution operator $W$ ,

$\min_\theta\max_\lambda \; \lambda^T\sqrt{W}\,r(\theta) - \frac{1}{2}\lambda^T\lambda + \mathcal{L}_2(\theta)$

This can be interpreted as a saddle-point problem in the primal-dual (PD) setting, where both the network parameters $\theta$ and the dual weights $\lambda$ are updated iteratively.

2. Primal-Dual Formulation and Optimization

The core of the convolution-weighting approach is its formulation as a constrained min–max optimization problem, translating the multi-objective character of PINNs into a Lagrangian framework where spatial coherence in the residuals is imposed through the dual update. According to (Barreau et al., 30 Jan 2025, Si et al., 24 Jun 2025), the update rules are:

Primal:

$\theta^{(k+1)} = \theta^{(k)} - \eta_\theta \Big(\sum_i \tilde{\lambda}_i^{(k)}\,\nabla_\theta r_i(\theta^{(k)}) + \nabla_\theta \mathcal{L}_2(\theta^{(k)})\Big)$

Dual:

$\lambda^{(k+1)} = (1-\eta_\lambda)\lambda^{(k)} + \eta_\lambda \sqrt{W}\,r(\theta^{(k+1)})$

with $\tilde{\lambda} = \sqrt{W}\lambda$ . This direct coupling of the dual update to a locally-averaged residual (rather than the raw residual) ensures physically meaningful and stable weight adaptation, mitigating overfitting to isolated points of high error.

The convolution operation smooths the influence of large residuals so that the training does not become trapped in non-representative local minima; simultaneously, it respects the underlying continuous nature of the solution to the PDE, as required by physical laws.

3. Mathematical and Algorithmic Structure

The convolution operator $W$ is taken to be symmetric positive semidefinite, ensuring stability of local averaging. The convolution can be implemented by:

Dense convolution, e.g., applying $W$ globally (costly in practice)
Sparse local averaging, with only nearest neighbors in the domain (efficient and amenable to high-dimensional problems)

The regularization term $-\frac{1}{2}\lambda^T\lambda$ penalizes excessive upweighting, maintaining numerical stability. For numerical implementation, the sum of the weights is normalized (e.g., $\sum_i \tilde{\lambda}_i = 1$ ), and the trainable parameters can be updated via standard (e.g., Adam) optimizers.

A summary of update steps:

Update Type	Variable	Update Rule
Primal (NN)	$\theta$	Gradient descent w/ convolved residuals
Dual (weights)	$\lambda$	Ascent w/ convolution of residuals

4. Empirical Performance and Benchmarking

Empirical results in (Si et al., 24 Jun 2025) demonstrate pronounced improvements over alternative weighting techniques (residual-based attention, self-adaptive weighting, loss-attentional approaches) across standard PDE benchmarks. Relative $L^2$ error and $L^\infty$ norms are systematically lower:

For the 1D heat equation (high-frequency regime), convolution-weighting achieves $1.04\times 10^{-3}$ relative $L^2$ error ( $L^\infty$ norm $5.73\times 10^{-3}$ ).
In the viscous Burgers equation (with stiffness and shocks), up to one order of magnitude improvement is reported over previous methods.
Generalization to multi-dimensional and inverse problems (e.g., Navier–Stokes, spatially varying Poisson coefficients): convolution-weighting yields more stable, spatially distributed error profiles in predictive solutions.

Adaptive resampling—relocating collocation points toward regions of persistent high residual—can be effectively integrated with convolution-based weighting, further enhancing focus and convergence.

5. Theoretical and Practical Implications

Convolution-weighting brings an explicit realization of the "kernel-task alignment" concept formulated in (Seroussi et al., 2023). By using localized convolution to match the spectral or spatial structure of the true PDE solution, PINNs using this strategy are less prone to spectral bias and are better equipped to generalize in regimes where solution features vary sharply or are highly localized. The primal-dual optimization perspective highlights the robustness of the approach, ensuring that hard constraints are adaptively enforced and that the balance between competing loss terms is not left to manual tuning or ad-hoc heuristics.

The procedure generalizes naturally to high-dimensional, multi-physics, and inverse problems, and opens avenues for integrating more sophisticated kernel structures (e.g., anisotropic, nonstationary convolutions) when domain or solution geometry demands.

6. Relation to Previous and Alternative Weighting Methods

Compared to the inverse-Dirichlet method (Maddu et al., 2021)—which equalizes the variance of gradients across objectives—and uncertainty weighting (Huang et al., 2021), convolution-weighting explicitly leverages local spatial information and continuity. The residual-quantile adjustment technique (Han et al., 2022) focuses on balancing heavy-tailed distributions but does not incorporate neighborhood structure. Residual-based attention (Anagnostopoulos et al., 2023) is pointwise and lacks spatial coherence. Feature-enforcing approaches (Jahaninasab et al., 2023), Fourier basis augmentation (Cooley et al., 4 Oct 2024), and barycentric interpolation (Liu et al., 28 Jun 2025) differ in their primary mechanism but are complementary and, in principle, could be combined with convolution-weighting.

7. Applications and Future Directions

Potential directions include:

High-fidelity forward solvers in heat, Navier–Stokes, and solid mechanics.
Inverse problems and multi-scale PDEs where adaptive focus on regions of sharp gradients/shocks is critical.
Robust, modular frameworks for scientific machine learning where physical domain knowledge suggests specific convolutional kernels.

The modular mathematical structure, primal-dual optimization compatibility, and demonstrated empirical gains over traditional methods position convolution-weighting as a foundation for further advances in mesh-free neural PDE solvers, especially in regimes dominated by continuity and spatially structured error.