Progressive Sharpening: Theory and Applications

Updated 4 July 2026

Progressive sharpening is a dynamical phenomenon where iterative updates enhance key system metrics, such as eigenvalues in optimization and edges in image processing.
Its applications span diverse fields, from stabilizing deep learning via edge-of-stability to refining pan-sharpening and language model outputs through staged concentration.
The concept unifies methods that progressively concentrate information under stabilizing constraints, offering actionable insights into convergence dynamics and performance limits.

Progressive sharpening denotes a family of stagewise concentration phenomena in which a system becomes sharper over time or across refinement steps, rather than through a single static transformation. The quantity that sharpens depends on the field: in deep learning optimization it is typically the largest Hessian eigenvalue; in image processing it is edge or contour contrast; in pan-sharpening it is high-resolution spatial detail fused into multispectral imagery; in language-model self-improvement it is probability mass concentrated on high-quality or high-likelihood sequences; and in monitored quantum dynamics it is the localization of a conserved quantity such as charge or spin (Li et al., 2022, Schaefer et al., 2022, Huang et al., 2024, Feng et al., 2024).

1. Semantic range and recurring structure

Across the literature, the term does not name a single algorithm. It names a dynamical pattern: a coarse, diffuse, or uncertain state is driven toward a sharper one by repeated updates, finite-time evolution, iterative refinement, or progressively stronger concentration.

Domain	What sharpens	Typical mechanism
Optimization	Largest Hessian eigenvalue	GD/SGD trajectory toward EOS
Image restoration	Edges, contours, high frequencies	Backward PDEs, detail injection, diffusion refinement
Pan-sharpening	Spatial detail in HRMS reconstruction	Progressive fusion, compensation, residual restoration
Language modeling	Probability mass on preferred outputs	Best-of- $N$ , SFT, RLHF, exploration
Quantum monitoring	Charge or spin localization	Recursive measurement-induced dynamics

In optimization, progressive sharpening is the initial phase in which sharpness rises toward the stability scale $2/\eta$ under gradient descent (Li et al., 2022). In image sharpening, it often means finite-time enhancement before steady-state segmentation-like behavior appears, as in stabilised inverse flowline evolution (SIFE) (Schaefer et al., 2022). In pan-sharpening, the adjective usually modifies a refinement architecture that avoids one-shot fusion by using staged compensation or cross-modal interaction (2207.14451, Zhou et al., 2022). In language modeling, sharpening refers to concentration of model mass around outputs already favored by a verifier or by the base policy itself (Huang et al., 2024). In monitored quantum systems, sharpening distinguishes fuzzy and sharp symmetry sectors and can occur separately from purification (Feng et al., 2024).

This range suggests that “progressive” is the crucial qualifier: the literature repeatedly contrasts staged concentration with direct one-step reconstruction, direct decoding, or fixed preprocessing.

2. Optimization: sharpness growth, edge of stability, and feedback with stepsize

In deep learning optimization, progressive sharpening refers to the increase of the largest Hessian eigenvalue $\lambda_{\max}(\nabla^2 f(w))$ during training, often until the normalized stability quantity approaches $\eta \lambda_{\max} \approx 2$ (Li et al., 2022). The basic empirical picture is that sharpness rises in an initial phase, reaches the learning-rate scale $2/\eta$ , and then enters an edge-of-stability (EOS) regime with oscillatory behavior rather than indefinite monotone growth (Li et al., 2022).

A central refinement of this picture is the four-phase decomposition of the gradient-descent trajectory. The initial phase is the progressive sharpening phase; later phases include overshoot above $2/\eta$ , a drop in sharpness, and recovery, so the full trajectory is cyclic rather than globally monotone (Li et al., 2022). A related minimalist model with one relevant and one irrelevant input coordinate proves the same qualitative structure: monotone increase in one stage, monotone decrease in a self-stabilization stage, and repeated cycling around the EOS threshold (Liu et al., 4 Mar 2025).

Several papers argue that this phenomenon is not specific to large neural networks. Second-order regression models, in which the output is quadratic in the parameters, already exhibit progressive sharpening and EOS-like stabilization. In a two-dimensional asymmetric quadratic model, the limiting curvature approaches a neighborhood of $2/\eta$ , slightly below the threshold by an $O(\eta)$ offset (Agarwala et al., 2022). A high-dimensional quadratic regression analysis similarly shows that nonzero second-order structure is sufficient for sharpening and that the discrete nonlinear correction stabilizes curvature near the edge of stability (Agarwala et al., 2022).

The role of the stepsize is not passive. With adaptive tuners, the joint dynamics of stepsize and sharpness matter. Armijo linesearch gives monotonically decreasing training loss but is empirically associated with ever-increasing sharpness, shrinking stepsizes, and operation below EOS. Polyak stepsizes, by contrast, often oscillate around EOS or slightly above it and outperform Armijo in deterministic settings (Roulet et al., 2023). The conclusion there is that EOS alone is insufficient; what matters is the coupled trajectory of $\gamma_t$ and $\lambda_{\max}$ (Roulet et al., 2023).

The stochastic regime modifies the picture further. Small-batch SGD exhibits conservative sharpening: large eigenvalues still grow, but their growth is suppressed more strongly as batch size decreases. In that setting, a stochastic edge of stability emerges, governed by a noise-kernel quantity $2/\eta$ 0 that depends on the whole spectrum rather than only on $2/\eta$ 1 (Agarwala et al., 2024).

A separate line of work ties progressive sharpening to the input-output Jacobian through the Gauss–Newton part of the Hessian. There, the claim is not that Hessian sharpness is fundamental, but that decreasing loss forces Jacobian growth on the data distribution, and that this Jacobian growth then drives sharpness upward under suitable conditions (MacDonald et al., 2023). This account also undercuts a common simplification: low sharpness is not by itself the explanation for generalization.

3. Image sharpening as finite-time evolution and selective detail enhancement

In image processing, progressive sharpening often refers to gradual edge enhancement under a controlled evolution. The most explicit PDE formulation in the cited literature is SIFE, defined by

$2/\eta$ 2

which sharpens across edges but not along them (Schaefer et al., 2022). The design responds to two constraints stated in the paper: backward diffusion is a natural inverse to Gaussian blur, but it is highly ill-posed; and isotropic backward sharpening can create irregular contours. SIFE therefore keeps the stabilization philosophy of freezing extrema while making the flow anisotropic (Schaefer et al., 2022).

The numerics are notable because the scheme uses one-sided morphological derivatives rather than standard finite differences. These derivatives approximate directional derivatives in gradient direction, provide rotation-invariant approximations with ball or disk structuring elements, and allow subpixel progression. In one dimension, the scheme satisfies a maximum–minimum principle when $2/\eta$ 3, and experiments indicate the same stability limit in two dimensions (Schaefer et al., 2022). The reported qualitative result is that SIFE yields nonflat steady states and sharper, more regular edges than SILD or a shock filter (Schaefer et al., 2022). The paper explicitly presents this as a progressive sharpening model: one stops at finite time for restoration, while steady state produces piecewise constant segmentation-like images.

A second formulation unifies smoothing and sharpening through a single guided filter coefficient. In the self-guided model, $2/\eta$ 4 gives smoothing, $2/\eta$ 5 gives identity, and $2/\eta$ 6 gives sharpening; in the guided version, the same generalized-Gamma MAP framework allows the coefficient to exceed $2/\eta$ 7, so sharpening becomes a continuation of the same edge-aware interpolation model rather than a separate operator (Deng et al., 2021). The paper further makes $2/\eta$ 8 adaptive using texture, depth, and blurriness cues, so sharpening strength varies spatially (Deng et al., 2021).

A related but distinct use appears in monocular depth refinement. SharpDepth starts from a metric depth estimate $2/\eta$ 9 and a diffusion-based affine-invariant depth estimate $\lambda_{\max}(\nabla^2 f(w))$ 0, computes a difference map $\lambda_{\max}(\nabla^2 f(w))$ 1, injects noise selectively into uncertain regions, and trains with an SDS loss plus a noise-aware reconstruction loss (Pham et al., 2024). The method is progressive in the sense stated by the paper: the teacher is replaced by an EMA of the training model, enabling iterative refinement in multiple steps, and the shrinking difference map focuses later updates on residual boundary errors (Pham et al., 2024).

Not every sharpening paper introduces such a schedule. In preemptive robustification by Laplacian sharpening, the transform

$\lambda_{\max}(\nabla^2 f(w))$ 2

is applied once, with a strength sweep over $\lambda_{\max}(\nabla^2 f(w))$ 3 rather than an iterative curriculum (Liang et al., 26 Mar 2026). The paper is explicit that no progressive or multi-stage sharpening schedule is studied there.

4. Remote sensing and pan-sharpening: staged fusion instead of one-shot reconstruction

In remote sensing, “pan-sharpening” is itself a task name, but several methods explicitly reinterpret it as progressive sharpening of spatial detail. PanFormer argues that PAN and MS should not be fused by a single-stream super-resolution mapping. It uses two modality-specific Transformer streams with self-attention, followed by cross-attention in both directions, so that spatial and spectral information are merged progressively rather than by direct concatenation (Zhou et al., 2022). The paper describes this as progressive fusion through repeated attention-based interaction.

PC-GANs makes the staged structure even more explicit. It criticizes one-step generators as highly dependent on reconstruction ability and prone to error accumulation, spatial detail loss, and spectral distortion. Its alternative is a two-step model: a deep multiscale guided GAN first produces a coarse pre-sharpened image, and a spatial-spectral residual compensation module then refines the remaining residuals using reverse-architecture GANs in coarse-to-fine and fine-to-coarse directions (2207.14451). The architecture is trained jointly with adversarial, cycle-consistent, and reconstruction losses (2207.14451).

A later development shifts the progressive idea from fusion alone to the training data model itself. Progressive Alignment Degradation Learning argues that the Wald protocol imposes an inaccurate fixed degradation pattern and thereby limits generalization from reduced-resolution to full-resolution pansharpening (Zhao et al., 25 Jun 2025). PADM alternates between PAlignNet and PDegradeNet so that degradation learning and alignment learning refine each other iteratively. On top of that, HFreqDiff predicts the residual $\lambda_{\max}(\nabla^2 f(w))$ 4 within a diffusion model conditioned on PAN, IMS, and an explicit high-frequency prior $\lambda_{\max}(\nabla^2 f(w))$ 5, with CFB and BAMB modules providing structural and band-aware detail guidance (Zhao et al., 25 Jun 2025). The diffusion reverse process then progressively restores high-frequency detail.

Taken together, these papers replace the one-step view of pan-sharpening with a refinement view: preliminary fusion, residual compensation, degradation adaptation, and frequency-selective reverse diffusion all serve as staged sharpening operators.

5. Distribution sharpening in language-model self-improvement and RL

In language-model self-improvement, sharpening no longer refers to geometric edges or curvature. It refers to concentrating probability mass on responses that the model already scores highly. One formulation defines a self-reward $\lambda_{\max}(\nabla^2 f(w))$ 6 and treats sharpening as pushing the policy toward $\lambda_{\max}(\nabla^2 f(w))$ 7; the main specialization sets $\lambda_{\max}(\nabla^2 f(w))$ 8, so sharpening means moving mass toward sequence-level maximum-likelihood outputs (Huang et al., 2024).

This produces a three-level picture. First, inference-time sharpening uses best-of- $\lambda_{\max}(\nabla^2 f(w))$ 9 sampling and verification. Second, SFT amortizes those selected outputs into the model. Third, RLHF-style or online methods use reward optimization and exploration to continue the concentration process (Huang et al., 2024). The theoretical limit is coverage: if the base model assigns too little probability to a good output, passive self-training cannot reliably recover it. The paper formalizes this through a coverage coefficient $\eta \lambda_{\max} \approx 2$ 0 and shows that SFT-based sharpening is minimax-optimal when coverage is sufficient, while exploration-based RL can bypass that limitation under structural assumptions (Huang et al., 2024).

A later analysis challenges the stronger claim that RL gains are just sharpening. It compares task-reward RL with distribution sharpening under a KL-regularized RL objective and argues that sharpening is limited, often unstable, and biased toward short outputs in variable-length generation because log-probabilities are nonpositive and EOS is absorbing (Mittal et al., 17 Apr 2026). The reported experiments on Llama-3.2-3B-Instruct, Qwen2.5-3B-Instruct, and Qwen3-4B-Instruct-2507 show that sharpening can improve performance in some settings, especially at inference time, but task-reward RL is more stable and more effective overall (Mittal et al., 17 Apr 2026).

This establishes an important distinction. In this literature, sharpening is a concentration mechanism, not a complete theory of capability gain. The papers explicitly frame it as extraction and amortization of existing information rather than creation of new information (Huang et al., 2024, Mittal et al., 17 Apr 2026).

6. Sharpening transitions in monitored quantum systems

In monitored quantum dynamics, sharpening denotes localization in a conserved quantum number. On dynamical quantum trees with $\eta \lambda_{\max} \approx 2$ 1 symmetry, the relevant question is whether charge becomes sharp; with $\eta \lambda_{\max} \approx 2$ 2 symmetry, the question becomes whether spin becomes sharp (Feng et al., 2024). The recursive tree structure enables generating-function recursions and Fisher–KPP-like traveling-wave analyses.

For $\eta \lambda_{\max} \approx 2$ 3, the paper shows that sharpening and purification can coincide or split, depending on the neutral local dimension $\eta \lambda_{\max} \approx 2$ 4. At $\eta \lambda_{\max} \approx 2$ 5, the exact critical point is $\eta \lambda_{\max} \approx 2$ 6; in the $\eta \lambda_{\max} \approx 2$ 7 limit, purification occurs at $\eta \lambda_{\max} \approx 2$ 8 while charge sharpening occurs at $\eta \lambda_{\max} \approx 2$ 9, so the transitions are distinct (Feng et al., 2024). For $2/\eta$ 0, the outcome is more constrained: the fuzzy phase is generic, and a sharp phase is possible only at maximal measurement rate $2/\eta$ 1, where the sharp/fuzzy boundary is solved analytically in $2/\eta$ 2 space (Feng et al., 2024).

Here progressive sharpening is neither image enhancement nor optimization instability. It is a measurement-induced transition in the trajectory ensemble, quantified by the decay of a fuzziness parameter such as $2/\eta$ 3. The common structural element is still present: sharpening is achieved by recursive updates that progressively suppress uncertainty in a distinguished variable.

7. Common misconceptions, limits, and cross-domain interpretation

Several recurrent misconceptions are corrected by the literature.

First, progressive sharpening is not synonymous with monotone increase forever. In optimization, it is usually the prelude to EOS, after which sharpness oscillates or self-stabilizes around a threshold rather than continuing to rise (Li et al., 2022, Liu et al., 4 Mar 2025). Second, sharpening is not always iterative. The preemptive robustification paper uses a one-shot Laplacian transform and explicitly states that no progressive schedule is studied (Liang et al., 26 Mar 2026). Third, sharpening is not always sufficient. In language-model post-training, distribution sharpening can recover latent behavior, but task-reward RL is empirically more stable and more effective on harder tasks (Mittal et al., 17 Apr 2026). Fourth, flatness is not the whole story in optimization: one line of work argues that Jacobian growth is the more fundamental object, with Hessian sharpness acting as a mediated effect (MacDonald et al., 2023).

Across domains, the stabilizing devices are also heterogeneous. SIFE freezes extrema and proves a maximum–minimum principle (Schaefer et al., 2022). Optimization analyses identify EOS or stochastic EOS boundaries (Li et al., 2022, Agarwala et al., 2024). Depth refinement uses masked reconstruction losses and EMA teachers (Pham et al., 2024). Pan-sharpening methods use progressive compensation, mutual iteration, or diffusion conditioning (2207.14451, Zhao et al., 25 Jun 2025). Language-model sharpening relies on KL regularization, coverage, or exploration (Huang et al., 2024, Mittal et al., 17 Apr 2026). Quantum sharpening is controlled by measurement rate and symmetry constraints (Feng et al., 2024).

A plausible implication is that “progressive sharpening” is best treated as a higher-level dynamical motif rather than a domain-specific primitive. The shared pattern is incremental concentration under a stabilizing constraint: sharpen only across edges, only until EOS, only where teachers disagree, only when the base policy has coverage, or only within symmetry-allowed sectors.