p-Laplacian Regularization in Machine Learning

Updated 5 March 2026

p-Laplacian regularization is a nonlinear extension of Laplacian smoothing that uses parameter p to balance smoothness, edge preservation, and data adaptivity.
It employs a variational formulation and Euler–Lagrange equations to connect discrete graph models with continuum $W^{1,p}$ regularization ensuring consistency in semi-supervised learning.
The framework extends to hypergraphs and modern architectures like Graph Neural Networks, offering robust performance for high-dimensional and low-label applications.

$p$ -Laplacian Regularization

A central theme in modern machine learning and signal processing is the regularization of functions or signals defined over discrete or continuous domains via penalties that promote smoothness or preserve structure. $p$ -Laplacian regularization generalizes classical Laplacian-based smoothing to nonlinear and data-adaptive regimes by introducing a parameter $p > 1$ into the regularizer, thereby interpolating between different trade-offs of smoothness, edge-preservation, adaptivity to data geometry, and robustness to label scarcity. This framework encapsulates a spectrum of behaviors in semi-supervised learning, graph signal processing, and high-order geometric data analysis, with foundations in variational calculus, partial differential equations, and spectral theory.

1. Variational Formulation of p-Laplacian Regularization

Given a weighted graph $G = (V, E, w)$ with vertices $V = \{x_1, \ldots, x_N\}$ , edge weights $w_{ij} \geq 0$ , and a subset $\mathcal{L} \subset V$ of $n$ labeled nodes with labels $\{y_i\}_{i\in\mathcal{L}}$ , the $p$ -Laplacian regularized learner seeks a function $f : V \rightarrow \mathbb{R}$ that is faithful to the known labels and sufficiently smooth according to the geometry encoded in graph structure. The minimization is formulated as: $\min_{f: V \rightarrow \mathbb{R}} E_p(f), \quad \text{where} \quad E_p(f) = \sum_{i, j} w_{ij} |f(x_i) - f(x_j)|^p + \lambda \sum_{i \in \mathcal{L}} (f(x_i) - y_i)^2$ where $\lambda > 0$ balances label fidelity versus regularization. Alternatively, one may impose hard label constraints $f(x_i)=y_i$ for $i\in\mathcal{L}$ and drop the second term.

The key ingredient, the $p$ -Dirichlet energy, is a nonlinear generalization of classical graph Laplacian regularization: $\sum_{i,j} w_{ij}|f(x_i) - f(x_j)|^p$ For $p=2$ , this yields standard graph Laplacian smoothing. For $p \neq 2$ , the regularizer becomes nonlinear, penalizing large differences more strongly for $p>2$ and less strongly for $1 < p < 2$. This allows tuning between smoothing and edge-preservation, and controlling how localized or diffuse the interpolant is (Alaoui et al., 2016).

As $N \to \infty$ and graph connectivity is set via a geometric random graph (e.g., edge weights $w_{ij} = \eta(\|x_i - x_j\| / h)$ for some kernel $\eta$ and bandwidth $h\to 0$ ), the discrete functional $\sum_{i,j} w_{ij}|f(x_i) - f(x_j)|^p$ converges (with scaling) to the continuum $p$ -Dirichlet energy: $I_p(f) = C_p \int_{\Omega} \|\nabla f(x)\|^p \rho(x)^2\,dx$ with $C_p$ encoding kernel normalization and $\rho$ the data density. The variational problem becomes: $\min_{f\,:\,f(x_i) = y_i\ (i \in \mathcal{L})} I_p(f)$ This establishes $p$ -Laplacian regularization as a discrete approximation to classical $W^{1,p}$ regularization subject to boundary constraints (Alaoui et al., 2016, Weihs et al., 2023).

2. Euler–Lagrange Equations, Graph and Continuum $p$ -Laplacians

The stationarity conditions for $E_p(f)$ with respect to $f(x_i)$ (for $i \notin \mathcal{L}$ and soft constraints) yield the discrete graph $p$ -Laplacian equation: $\sum_{j=1}^N w_{ij}|f(x_i) - f(x_j)|^{p-2}(f(x_i)-f(x_j)) = 0$ This nonlinear system generalizes the harmonicity condition of the standard Laplacian ( $p=2$ ). For vector-valued functions or in deep architectures, this core operator governs the construction of $p$ -Laplacian-based Graph Neural Networks (Fu et al., 2021).

In the geometric random graph limit, the Euler–Lagrange equation becomes a weighted $p$ -Laplacian PDE: $\nabla\!\cdot\left[\rho(x)^2 \|\nabla f(x)\|^{p-2} \nabla f(x)\right] = 0 \quad x \notin \{x_i: i \in \mathcal{L}\}$ or, equivalently,

$\Delta_2 f + 2\,\nabla\log\rho \cdot \nabla f + (p-2)\,\Delta_\infty f = 0$

where $\Delta_2 f = \mathrm{tr}\,\nabla^2 f$ and $\Delta_\infty f = (\nabla f)^T \nabla^2 f \nabla f / \|\nabla f\|^2$ (Alaoui et al., 2016).

These nonlinear equations form the analytic backbone of $p$ -Laplacian regularization in both discrete and continuum regimes, and underpin the behavior of regularized learners as problem parameters vary.

3. Phase Transition, Degeneracy, and Smoothness

A qualitative shift in the behavior of solutions—termed the "phase transition"—emerges at $p = d+1$ for data in $d$ dimensions (Alaoui et al., 2016, Slepčev et al., 2017). For $p \leq d$ the minimizer of the $p$ -Dirichlet energy under pointwise interpolation constraints becomes degenerate, with the solution collapsing to "spiky" interpolants that are essentially discontinuous except at the constraints. Analytically: $I_p(f_\epsilon) \lesssim \epsilon^{d-p} \quad (\epsilon \to 0)$ means the minimal energy can be made arbitrarily small by concentrating the transition of $f$ into a vanishingly narrow region about each label. For $p = d$ , degeneracy still obtains via a logarithmic bound.

For $p > d$ , Sobolev embedding implies that minimizers are Hölder continuous, eliminating spikes and ensuring genuine regularity. The critical value $p = d+1$ is optimal: this is the smallest $p$ for which label interpolation yields a nonsingular, continuous solution. This phase transition is observed distinctly for finite samples (numerical simulations confirm the regime boundaries) (Alaoui et al., 2016, Weihs et al., 2023).

4. Trade-off: Smoothness Versus Adaptivity to Data Density

The parameter $p$ modulates a core regularization dilemma: small $p$ yields high adaptation to the unlabeled data distribution $\rho$ but risks degeneracy, while large $p$ leads to increasingly smooth solutions that eventually become oblivious to $\rho$ .

For any finite $p$ , the drift term $2 \nabla \log \rho \cdot \nabla f$ in the continuum PDE ensures that solutions adapt to high-density regions, reflecting the manifold or cluster assumption common in semi-supervised learning.
As $p \to \infty$ , the normed energy converges to the Lipschitz semi-norm of $f$ , and the Euler–Lagrange equation reduces to the $\infty$ -Laplacian $\Delta_\infty f = 0$ whose solution—the Absolutely Minimal Lipschitz Extension (AMLE)—is independent of $\rho$ . That is, in the limit $p = \infty$ , the solution disregards the structure of unlabeled data, interpolating labels among the shortest paths regardless of data geometry (Alaoui et al., 2016).

The table below summarizes these behaviors:

Regime	Smoothness	Sensitivity to $\rho$	Limiting object
$p \leq d$	Degenerate/spiky	Influenced by $\rho$	Discontinuous, spike solutions
$p \geq d+1$	Continuous/Hölder	Adapts to $\rho$	Regular, data-adaptive interpolant
$p \to \infty$	Globally Lipschitz	Ignores $\rho$	AMLE

Amplifying this, in one-dimensional examples, the risk associated with $p=2$ (density-adaptive) can substantially outperform $p = \infty$ (density-oblivious) for regression under the semi-supervised smoothness model (Alaoui et al., 2016).

5. Extensions to Hypergraphs and Higher-Order Structures

$p$ -Laplacian regularization generalizes efficiently to hypergraphs, where relationships involving more than two nodes are encoded. For a hypergraph $H=(V, E, W)$ , the hypergraph $p$ -Laplacian energy is often formed edge-wise: $R_p(u) = \sum_{e \in E} w_e \left[\max_{x_i, x_j \in e} |u(x_i) - u(x_j)|\right]^p$

This structure preserves higher-order data geometry and suppresses spikes more robustly than standard graph-based models, especially at low label rates or coarse connectivity. Variational consistency with continuum $p$ -Dirichlet regularization has been established under weaker scaling assumptions on hypergraph construction than for graphs, crucially improving flexibility and robustness for semi-supervised and interpolation tasks (Shi et al., 2024, Shi et al., 2024).

Efficient algorithms for large-scale hypergraph $p$ -Laplacian problems include stochastic primal–dual hybrid gradient (SPDHG) methods for non-differentiable convex objectives and simplified nonlinear PDE relaxations that achieve single-valued and well-posed solutions with computational guarantees. These are especially effective at suppressing spiky artifacts and reducing computational time by orders of magnitude compared with direct subgradient or primal–dual approaches (Shi et al., 2024).

6. Connections to Machine Learning Architectures

$p$ -Laplacian regularization underpins a range of recent machine learning models:

In Graph Neural Networks, $p$ -Laplacian-based message passing enables adaptive spectral filtering that simultaneously captures low- and high-frequency behaviors, allowing tailored denoising or edge-preserving operations suited to both homophilic and heterophilic graph regimes (Fu et al., 2021).
In Transformer architectures, $p$ -Laplacian perspectives clarify that the self-attention mechanism implements $p=2$ Laplacian regularization, and that replacing $p=2$ with general $p$ enables a continuum between smoothing and contrast-enhancement, improving representation capacity for both local and heterophilic interactions (Nguyen et al., 2023).
Ensemble $p$ -Laplacian regularization frameworks combine multiple $p$ -Laplacians in a convex combination, automatically tuning to the intrinsic data structure and achieving consistently superior performance in semi-supervised classification benchmarks (Ma et al., 2018).
Hypergraph $p$ -Laplacian regularization enhances manifold regularization in semi-supervised learning and image processing, effectively exploiting complex geometric relationships in high-dimensional and low-label contexts, with empirical superiority on tasks such as remote sensing image recognition (Ma et al., 2018).

7. Discretization, Continuum Limits, and Consistency Theory

A substantial theoretical foundation links discrete $p$ -Laplacian regularization on graphs/hypergraphs with continuum $W^{1,p}$ and nonlocal variational problems. Sufficient conditions for consistency and quantitative convergence rates have been established as the number of points $n$ grows: $E_{n,\text{con}}^{(p)}(f) = \frac{1}{\epsilon_n^p n^2}\sum_{i,j=1}^n w_{ij} |f(x_i) - f(x_j)|^p$ converges to the corresponding continuum energy under regimes where $p>d$ and the connection radius $\epsilon_n$ scales as $n^{-1/p}\ll\epsilon_n\ll n^{-1/d}$ . Relaxed models with softened label constraints or expanded label neighborhoods remove upper bound restrictions on $\epsilon_n$ (Slepčev et al., 2017).

For practical computation, nonlocal discretizations, forward Euler time-stepping of associated gradient flows, and error rates depending on mesh, kernel, and time-step parameters have been rigorously analyzed, providing explicit prescriptions for rate-optimal approximation and guidance for implementation in random geometric graphs and more general data geometries (Weihs et al., 2023, Hafiene et al., 2018).

Representative works establishing the above include "Asymptotic behavior of $\ell_p$ -based Laplacian regularization in semi-supervised learning" (Alaoui et al., 2016), "Analysis of $p$ -Laplacian Regularization in Semi-Supervised Learning" (Slepčev et al., 2017), "Discrete-to-Continuum Rates of Convergence for $p$ -Laplacian Regularization" (Weihs et al., 2023), "p $-Laplacian Based Graph Neural Networks" [2111.07337], and "Hypergraph$ p$-Laplacian regularization on point clouds for data interpolation" (Shi et al., 2024).