Papers
Topics
Authors
Recent
Search
2000 character limit reached

Deceptron Inverse-Preconditioned Gradient (D-IPG)

Updated 18 May 2026
  • D-IPG is a learned iterative optimization algorithm for nonlinear inverse problems that uses a bidirectional neural surrogate to model both the forward process and its local inverse.
  • It employs a Jacobian Composition Penalty to enforce local inverse consistency, transforming output-space residual descents into well-scaled latent-space updates, akin to Gauss–Newton methods.
  • Empirical evaluations on PDE-governed tasks show that D-IPG achieves significant speedups and reliability compared to classical approaches, while remaining lightweight and highly parallelizable.

The Deceptron Inverse-Preconditioned Gradient (D-IPG) algorithm is a learned iterative optimization method tailored for nonlinear inverse problems, particularly those arising in the physical sciences. D-IPG leverages a bidirectional neural surrogate—termed the Deceptron—that simultaneously learns a forward surrogate and a local inverse map. By training the reverse operator to approximate the local pseudoinverse of the forward map’s Jacobian, D-IPG produces preconditioned updates which empirically match or surpass the speed and stability of classical Gauss–Newton methods, while remaining lightweight and highly parallelizable. The algorithm is centered around a Jacobian Composition Penalty (JCP) that enforces local inverse consistency, enabling the method to transform output-space residual descent into well-scaled latent-space updates. D-IPG demonstrates robust acceleration and reliability across a suite of partial differential equation (PDE)–governed inverse problems (Kachhadiya, 26 Nov 2025, Kachhadiya, 13 May 2026).

1. Architectural Principles and Training of the Deceptron Module

The Deceptron architecture comprises two neural operators: a forward surrogate fW:RdinRdoutf_W:\mathbb{R}^{d_{\text{in}}}\to\mathbb{R}^{d_{\text{out}}} representing the physics or forward process, and a learned inverse gV:RdoutRding_V:\mathbb{R}^{d_{\text{out}}}\to\mathbb{R}^{d_{\text{in}}}, which approximates local inversion.

Formally, with xx denoting latent input (parameters or initial conditions), and yy^* the observed measurement, the module is:

  • fW(x)=σ(Wx+b)f_W(x) = \sigma(W x + b),
  • gV(y)=σ~(Vy+c)g_V(y) = \tilde{\sigma}(V y + c),

where σ,σ~\sigma, \tilde{\sigma} are lightweight nonlinearities (e.g., leaky ReLU); W,VW, V are weight matrices, and b,cb, c biases.

The multi-term training loss L\mathcal{L} combines:

  • Task fit: gV:RdoutRding_V:\mathbb{R}^{d_{\text{out}}}\to\mathbb{R}^{d_{\text{in}}}0,
  • Forward–reverse consistency: gV:RdoutRding_V:\mathbb{R}^{d_{\text{out}}}\to\mathbb{R}^{d_{\text{in}}}1,
  • Cyclic gV:RdoutRding_V:\mathbb{R}^{d_{\text{out}}}\to\mathbb{R}^{d_{\text{in}}}2-space consistency: gV:RdoutRding_V:\mathbb{R}^{d_{\text{out}}}\to\mathbb{R}^{d_{\text{in}}}3,
  • Spectral penalty: gV:RdoutRding_V:\mathbb{R}^{d_{\text{out}}}\to\mathbb{R}^{d_{\text{in}}}4,
  • Soft bias tie: gV:RdoutRding_V:\mathbb{R}^{d_{\text{out}}}\to\mathbb{R}^{d_{\text{in}}}5,
  • (Optional) weight-tie: gV:RdoutRding_V:\mathbb{R}^{d_{\text{out}}}\to\mathbb{R}^{d_{\text{in}}}6,
  • Jacobian Composition Penalty (JCP): gV:RdoutRding_V:\mathbb{R}^{d_{\text{out}}}\to\mathbb{R}^{d_{\text{in}}}7,

where gV:RdoutRding_V:\mathbb{R}^{d_{\text{out}}}\to\mathbb{R}^{d_{\text{in}}}8, gV:RdoutRding_V:\mathbb{R}^{d_{\text{out}}}\to\mathbb{R}^{d_{\text{in}}}9, and xx0 is a random probe vector. The JCP term uses Hutchinson’s identity to efficiently estimate Frobenius deviation from the identity, enforcing local inverse behavior xx1 (Kachhadiya, 26 Nov 2025, Kachhadiya, 13 May 2026).

2. D-IPG Update Rule and First-Order Equivalence

D-IPG is designed to solve regularized least-squares inverse problems of the form xx2 efficiently, even for ill-conditioned forward operators.

At each iteration xx3:

  • Compute current surrogate output xx4 and residual xx5,
  • Propose an output-space update xx6,
  • Pull the proposal back via the inverse operator: xx7,
  • Form the latent-space step xx8,
  • Apply convex combination and feasible set projection: xx9,
  • Use Armijo-style backtracking line search to ensure sufficient decrease,
  • Terminate when the normalized residual yy^*0 (Kachhadiya, 26 Nov 2025, Kachhadiya, 13 May 2026).

Taylor expanding yy^*1 about yy^*2 yields (under suitable regularity and local inverse assumptions):

yy^*3

with yy^*4 the Moore–Penrose pseudoinverse and yy^*5 the learned-inverse error. Thus, D-IPG generalizes the damped Gauss–Newton method up to a composition error term (Kachhadiya, 13 May 2026).

3. Theoretical Guarantees and Conditioning

If yy^*6, D-IPG updates follow the Gauss–Newton direction for residuals in the range of yy^*7. The deviation bound is:

yy^*8

where yy^*9 is the smallest singular value. Hence, small JCP (i.e., a low value of fW(x)=σ(Wx+b)f_W(x) = \sigma(W x + b)0) and well-conditioned fW(x)=σ(Wx+b)f_W(x) = \sigma(W x + b)1 ensure D-IPG tracks second-order updates, while high JCP or poor conditioning produce larger deviations.

In the special case when fW(x)=σ(Wx+b)f_W(x) = \sigma(W x + b)2, D-IPG and Gauss–Newton are locally equivalent for all admissible fW(x)=σ(Wx+b)f_W(x) = \sigma(W x + b)3 (Kachhadiya, 26 Nov 2025, Kachhadiya, 13 May 2026).

4. Empirical Performance and Benchmark Results

D-IPG has been evaluated on a comprehensive suite of synthetic PDE-governed inverse problems, including Heat-1D/2D/3D, Darcy-2D, Advection–Diffusion-2D, Allen–Cahn-2D, and Navier–Stokes-2D. All solvers share identical Armijo parameters, projection, and stopping criteria for fair comparison.

Key representative results:

  • On Heat-3D, D-IPG reaches target residual in median 5 iterations (0.033s), compared to GN (7 iters, 1.15s) and LM (6 iters, 0.82s), achieving speedups up to 35×.
  • On Advection–Diffusion-2D, D-IPG requires 7 iters (0.10s), versus GN (33 iters, 18.5s, 185× slower) and LM (12 iters, 3.16s) (Kachhadiya, 13 May 2026).

Across all seven benchmarks, D-IPG attains comparable or stronger convergence than second-order baselines but at up to 77× lower inference-time solve cost. The six-problem reliability suite reports mean success of 94.8% for D-IPG, versus 17.3% (GN) and 65.5% (LM).

Method Heat-1D (iters) Heat-3D (iters) Advection–Diff (iters) Mean Success (%)
D-IPG 2.8 ± 1.0 5 7 94.8
GN/LM 2.8 ± 0.9 7/6 33/12 17.3/65.5
x-GD 58.2 ± 28.9

D-IPG per-iteration cost is low (≈2 forward passes and 1 backprop), requiring no Hessian solves, and is thus highly parallelizable.

5. The Role of the Jacobian Composition Penalty (JCP)

The JCP term is central to D-IPG. It ensures that the reverse model’s Jacobian acts as a local left inverse of the forward model, enforcing fW(x)=σ(Wx+b)f_W(x) = \sigma(W x + b)4. The runtime diagnostic, RJCP, tracks the same inverse-consistency error along the optimization trajectory:

fW(x)=σ(Wx+b)f_W(x) = \sigma(W x + b)5

Empirically, lower RJCP values correlate directly with fewer D-IPG iterations and trajectories closely mirroring Gauss–Newton updates.

Ablation studies reveal:

  • Disabling JCP inflates composition error (RJCP) from near zero to ≫100 (e.g., 458 on Heat-1D), and iteration counts increase (from 2.6 to 3.8),
  • Forcing fW(x)=σ(Wx+b)f_W(x) = \sigma(W x + b)6 (“tied” parameterization) impairs composition (RJCP ≫ 100) and substantially degrades solve efficiency,
  • Removing auxiliary fit terms (rec, cyc), while retaining JCP, does not impact iteration count, highlighting that reliable preconditioning is due to local inverse enforcement.

On Allen–Cahn-2D, including JCP increases basin access: with JCP, 100% of runs reach the low-error basin; without it, only 16.3% succeed. This suggests JCP enhances global reliability primarily by ensuring robust access to favorable convergence regions (Kachhadiya, 26 Nov 2025, Kachhadiya, 13 May 2026).

6. Applications and Limitations

D-IPG is applicable to a broad class of inverse problems where surrogate models are viable. Benchmark tasks include diffusion initial-condition recovery, elliptic parameter recovery, and nonlinear fluid parameter estimation. The method is agnostic to the underlying surrogate architecture, supporting both MLP and shallow CNN parametrizations for fW(x)=σ(Wx+b)f_W(x) = \sigma(W x + b)7 and fW(x)=σ(Wx+b)f_W(x) = \sigma(W x + b)8.

Limiting factors include failure modes for extremely ill-conditioned regimes (e.g., Darcy-2D, 69% success) and reliance on sufficiently expressive surrogates for fW(x)=σ(Wx+b)f_W(x) = \sigma(W x + b)9 to accurately approximate the local pseudoinverse. A plausible implication is that, without adequate network capacity, or with very poor initializations, the benefits of D-IPG may be diminished.

7. Significance and Theoretical Context

D-IPG constitutes a learned, amortized variant of local inverse geometric preconditioning for nonlinear inverse problems. By integrating a learnable, projection-efficient pseudoinverse via JCP-regularized training, D-IPG bypasses the repeated linear system solves central to classical curvature-aware methods (e.g., Gauss–Newton, Levenberg–Marquardt), while retaining first-order equivalence in the small-error limit.

This framework substantiates a connection between local inverse learning and traditional optimization preconditioning, suggesting a principled, scalable path for hybrid learned-analytic solvers in scientific computing (Kachhadiya, 13 May 2026, Kachhadiya, 26 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Deceptron Inverse-Preconditioned Gradient (D-IPG).