Deceptron Inverse-Preconditioned Gradient (D-IPG)
- D-IPG is a learned iterative optimization algorithm for nonlinear inverse problems that uses a bidirectional neural surrogate to model both the forward process and its local inverse.
- It employs a Jacobian Composition Penalty to enforce local inverse consistency, transforming output-space residual descents into well-scaled latent-space updates, akin to Gauss–Newton methods.
- Empirical evaluations on PDE-governed tasks show that D-IPG achieves significant speedups and reliability compared to classical approaches, while remaining lightweight and highly parallelizable.
The Deceptron Inverse-Preconditioned Gradient (D-IPG) algorithm is a learned iterative optimization method tailored for nonlinear inverse problems, particularly those arising in the physical sciences. D-IPG leverages a bidirectional neural surrogate—termed the Deceptron—that simultaneously learns a forward surrogate and a local inverse map. By training the reverse operator to approximate the local pseudoinverse of the forward map’s Jacobian, D-IPG produces preconditioned updates which empirically match or surpass the speed and stability of classical Gauss–Newton methods, while remaining lightweight and highly parallelizable. The algorithm is centered around a Jacobian Composition Penalty (JCP) that enforces local inverse consistency, enabling the method to transform output-space residual descent into well-scaled latent-space updates. D-IPG demonstrates robust acceleration and reliability across a suite of partial differential equation (PDE)–governed inverse problems (Kachhadiya, 26 Nov 2025, Kachhadiya, 13 May 2026).
1. Architectural Principles and Training of the Deceptron Module
The Deceptron architecture comprises two neural operators: a forward surrogate representing the physics or forward process, and a learned inverse , which approximates local inversion.
Formally, with denoting latent input (parameters or initial conditions), and the observed measurement, the module is:
- ,
- ,
where are lightweight nonlinearities (e.g., leaky ReLU); are weight matrices, and biases.
The multi-term training loss combines:
- Task fit: 0,
- Forward–reverse consistency: 1,
- Cyclic 2-space consistency: 3,
- Spectral penalty: 4,
- Soft bias tie: 5,
- (Optional) weight-tie: 6,
- Jacobian Composition Penalty (JCP): 7,
where 8, 9, and 0 is a random probe vector. The JCP term uses Hutchinson’s identity to efficiently estimate Frobenius deviation from the identity, enforcing local inverse behavior 1 (Kachhadiya, 26 Nov 2025, Kachhadiya, 13 May 2026).
2. D-IPG Update Rule and First-Order Equivalence
D-IPG is designed to solve regularized least-squares inverse problems of the form 2 efficiently, even for ill-conditioned forward operators.
At each iteration 3:
- Compute current surrogate output 4 and residual 5,
- Propose an output-space update 6,
- Pull the proposal back via the inverse operator: 7,
- Form the latent-space step 8,
- Apply convex combination and feasible set projection: 9,
- Use Armijo-style backtracking line search to ensure sufficient decrease,
- Terminate when the normalized residual 0 (Kachhadiya, 26 Nov 2025, Kachhadiya, 13 May 2026).
Taylor expanding 1 about 2 yields (under suitable regularity and local inverse assumptions):
3
with 4 the Moore–Penrose pseudoinverse and 5 the learned-inverse error. Thus, D-IPG generalizes the damped Gauss–Newton method up to a composition error term (Kachhadiya, 13 May 2026).
3. Theoretical Guarantees and Conditioning
If 6, D-IPG updates follow the Gauss–Newton direction for residuals in the range of 7. The deviation bound is:
8
where 9 is the smallest singular value. Hence, small JCP (i.e., a low value of 0) and well-conditioned 1 ensure D-IPG tracks second-order updates, while high JCP or poor conditioning produce larger deviations.
In the special case when 2, D-IPG and Gauss–Newton are locally equivalent for all admissible 3 (Kachhadiya, 26 Nov 2025, Kachhadiya, 13 May 2026).
4. Empirical Performance and Benchmark Results
D-IPG has been evaluated on a comprehensive suite of synthetic PDE-governed inverse problems, including Heat-1D/2D/3D, Darcy-2D, Advection–Diffusion-2D, Allen–Cahn-2D, and Navier–Stokes-2D. All solvers share identical Armijo parameters, projection, and stopping criteria for fair comparison.
Key representative results:
- On Heat-3D, D-IPG reaches target residual in median 5 iterations (0.033s), compared to GN (7 iters, 1.15s) and LM (6 iters, 0.82s), achieving speedups up to 35×.
- On Advection–Diffusion-2D, D-IPG requires 7 iters (0.10s), versus GN (33 iters, 18.5s, 185× slower) and LM (12 iters, 3.16s) (Kachhadiya, 13 May 2026).
Across all seven benchmarks, D-IPG attains comparable or stronger convergence than second-order baselines but at up to 77× lower inference-time solve cost. The six-problem reliability suite reports mean success of 94.8% for D-IPG, versus 17.3% (GN) and 65.5% (LM).
| Method | Heat-1D (iters) | Heat-3D (iters) | Advection–Diff (iters) | Mean Success (%) |
|---|---|---|---|---|
| D-IPG | 2.8 ± 1.0 | 5 | 7 | 94.8 |
| GN/LM | 2.8 ± 0.9 | 7/6 | 33/12 | 17.3/65.5 |
| x-GD | 58.2 ± 28.9 | — | — | — |
D-IPG per-iteration cost is low (≈2 forward passes and 1 backprop), requiring no Hessian solves, and is thus highly parallelizable.
5. The Role of the Jacobian Composition Penalty (JCP)
The JCP term is central to D-IPG. It ensures that the reverse model’s Jacobian acts as a local left inverse of the forward model, enforcing 4. The runtime diagnostic, RJCP, tracks the same inverse-consistency error along the optimization trajectory:
5
Empirically, lower RJCP values correlate directly with fewer D-IPG iterations and trajectories closely mirroring Gauss–Newton updates.
Ablation studies reveal:
- Disabling JCP inflates composition error (RJCP) from near zero to ≫100 (e.g., 458 on Heat-1D), and iteration counts increase (from 2.6 to 3.8),
- Forcing 6 (“tied” parameterization) impairs composition (RJCP ≫ 100) and substantially degrades solve efficiency,
- Removing auxiliary fit terms (rec, cyc), while retaining JCP, does not impact iteration count, highlighting that reliable preconditioning is due to local inverse enforcement.
On Allen–Cahn-2D, including JCP increases basin access: with JCP, 100% of runs reach the low-error basin; without it, only 16.3% succeed. This suggests JCP enhances global reliability primarily by ensuring robust access to favorable convergence regions (Kachhadiya, 26 Nov 2025, Kachhadiya, 13 May 2026).
6. Applications and Limitations
D-IPG is applicable to a broad class of inverse problems where surrogate models are viable. Benchmark tasks include diffusion initial-condition recovery, elliptic parameter recovery, and nonlinear fluid parameter estimation. The method is agnostic to the underlying surrogate architecture, supporting both MLP and shallow CNN parametrizations for 7 and 8.
Limiting factors include failure modes for extremely ill-conditioned regimes (e.g., Darcy-2D, 69% success) and reliance on sufficiently expressive surrogates for 9 to accurately approximate the local pseudoinverse. A plausible implication is that, without adequate network capacity, or with very poor initializations, the benefits of D-IPG may be diminished.
7. Significance and Theoretical Context
D-IPG constitutes a learned, amortized variant of local inverse geometric preconditioning for nonlinear inverse problems. By integrating a learnable, projection-efficient pseudoinverse via JCP-regularized training, D-IPG bypasses the repeated linear system solves central to classical curvature-aware methods (e.g., Gauss–Newton, Levenberg–Marquardt), while retaining first-order equivalence in the small-error limit.
This framework substantiates a connection between local inverse learning and traditional optimization preconditioning, suggesting a principled, scalable path for hybrid learned-analytic solvers in scientific computing (Kachhadiya, 13 May 2026, Kachhadiya, 26 Nov 2025).