Papers
Topics
Authors
Recent
2000 character limit reached

Newton-Kaczmarz Algorithm

Updated 11 December 2025
  • Newton-Kaczmarz Algorithm is an iterative projection method that hybridizes Newton's method and the Kaczmarz technique to solve nonlinear systems one equation at a time.
  • It updates parameters by sequentially linearizing scalar equations using row-vector pseudoinverses, thereby reducing computational burden without full Jacobian inversion.
  • Applied to Kolmogorov-Arnold models, the method enables robust, efficient parameter estimation in large-scale regression problems with improved convergence under noisy initializations.

The Newton-Kaczmarz (NK) algorithm is an iterative projection-based method for solving nonlinear systems of equations, developed as a hybridization of Newton's method and the classical Kaczmarz row-action technique. Its principal application, as presented by Poluektov & Polar, is the efficient estimation of parameters in so-called Kolmogorov-Arnold models—structured representations of multivariate functions via compositions of univariate functions, as guaranteed by the Kolmogorov-Arnold theorem. The NK method linearizes and optimizes one scalar equation at a time, thus avoiding the explicit computation and inversion of full Jacobian matrices, and is particularly well-suited for large-scale regression problems where the number of equations or data records is considerable (Poluektov et al., 2023).

1. Mathematical Formulation

Given a system of NN nonlinear equations in rr unknowns, L(Z)=0\mathbf{L}(\mathbf{Z}) = 0, where L(Z)=[L1(Z),,LN(Z)]T\mathbf{L}(\mathbf{Z}) = [L^1(\mathbf{Z}),\ldots,L^N(\mathbf{Z})]^T and ZRr\mathbf{Z} \in \mathbb{R}^r, the objective is to find Z\mathbf{Z} such that all residuals vanish. Instead of employing a classical Newton update, which requires the computation and inversion of the r×rr \times r Jacobian J(Z)J(\mathbf{Z}), the NK algorithm updates Z\mathbf{Z} sequentially with respect to one equation, indexed by ii, at each iteration: Zq+1=ZqμLi(Zq)Ji(Zq)2Ji(Zq)T\mathbf{Z}^{q+1} = \mathbf{Z}^q - \mu \frac{L^i(\mathbf{Z}^q)}{\|\mathbf{J}_i(\mathbf{Z}^q)\|^2} \mathbf{J}_i(\mathbf{Z}^q)^T where Ji(Z)=Li(Z)T\mathbf{J}_i(\mathbf{Z}) = \nabla L^i(\mathbf{Z})^T is the row-Jacobian of the ii-th equation and μ(0,2)\mu \in (0,2) is a relaxation parameter. This excludes second order terms and projects onto the local linearization hyperplane defined by LiL^i. The update exploits the row-vector pseudoinverse: Ji(Z)=Ji(Z)TJi(Z)2\mathbf{J}_i(\mathbf{Z})^\dagger = \frac{\mathbf{J}_i(\mathbf{Z})^T}{\|\mathbf{J}_i(\mathbf{Z})\|^2} yielding an efficient one-dimensional adaptation at each step [(Poluektov et al., 2023), eqs. (8)-(9)].

2. Algorithmic Structure

The generic NK iteration applies the above update in a cyclic or randomized fashion across the NN equations (or data records):

  1. Initialize Z0\mathbf{Z}^0.
  2. For each iteration qq:
    • Select index ii (cyclic: i=(qmodN)+1i=(q \bmod N)+1, or random).
    • Compute the residual Li(Zq)L^i(\mathbf{Z}^q) and gradient Li(Zq)\nabla L^i(\mathbf{Z}^q).
    • If Li(Zq)2\|\nabla L^i(\mathbf{Z}^q)\|^2 is too small, break (singular linearization).
    • Update Z\mathbf{Z} via the projected step with relaxation μ\mu.
    • Stop upon convergence of the parameter update or the residual.

When specialized to the Kolmogorov-Arnold (KA) model, the method adapts to the parameterization of the representation’s inner and outer univariate functions. The parameters HkjpH_{kjp} and GklG_{kl} govern the decomposition into basis functions ϕp(x)\phi^p(x) and ψl(t)\psi^l(t), respectively. The update rules for these parameters are:

  • For all k,j,pk, j, p: Hkjpq+1=Hkjpq+μBkjpΔH_{kjp}^{q+1} = H_{kjp}^q + \mu B_{kjp}\Delta
  • For all k,lk, l: Gklq+1=Gklq+μAklΔG_{kl}^{q+1} = G_{kl}^q + \mu A_{kl}\Delta

where model outputs AklA_{kl}, BkjpB_{kjp}, and scaling factor ζ\zeta are computed from current HH and GG, and Δ=(yiE)/ζ\Delta = (y_i-E)/\zeta measures the normalized residual [(Poluektov et al., 2023), eqs. (17)-(18)].

3. Application to Kolmogorov-Arnold Models

Kolmogorov-Arnold models, or networks, express continuous multivariate functions by composition of parameterized univariate transforms. These are constructed from basis expansions: f(X)=k,lGklψl(j,pHkjpϕp(Xj))f(\mathbf{X}) = \sum_{k, l} G_{kl} \, \psi^l \left(\sum_{j, p} H_{kjp} \, \phi^p(X_j) \right) Determining suitable HkjpH_{kjp} and GklG_{kl} from data constitutes a nonlinear inverse problem. The NK approach decomposes the solution into iterative 1D projections, significantly reducing computational burden per update to O(nm+s)O(n m + s) where nn and ss are grid sizes for basis expansions [(Poluektov et al., 2023), section 4.2]. This structure confers distinct practical advantages in memory usage and batchwise computation.

4. Convergence and Robustness

Under the conditions that each Li(Z)L^i(\mathbf{Z}) is continuously differentiable in a neighborhood of a solution Z\mathbf{Z}^* and that Li(Z)0\nabla L^i(\mathbf{Z}^*) \ne 0, the NK algorithm exhibits local convergence for sufficiently good initial guesses [(Poluektov et al., 2023), appendix A]. Empirical results indicate improved robustness relative to the Gauss-Newton (GN) method in fitting KA model parameters, particularly as the initial guess is perturbed away from the true solution. In ridge-function identification tasks (for example, m=5m=5, s=3s=3, N=400N=400), NK maintains high frequencies of low-RMSE solutions even for poor initializations; for perturbation magnitude α=1.2\alpha=1.2, GN achieves RMSE <10%<10\% in 33%\approx33\% of runs, compared to NK’s 78%\approx78\% [(Poluektov et al., 2023), Table 1].

The practical convergence rate with the KA model and piecewise-linear basis ϕ\phi, ψ\psi can be estimated empirically by

logRMSEαlog(epochs)\log \mathrm{RMSE} \simeq -\alpha \log (\mathrm{epochs})

with RMSE approaching 0.5%0.5\% after $500$ passes through a dataset of N=104N=10^4 records, when μ=1\mu=1 [(Poluektov et al., 2023), section 4.2].

5. Practical Considerations for Implementation

Efficient implementation of the NK method for KA models is contingent on several choices:

  • Basis selection: Piecewise-linear functions ϕp\phi^p, ψl\psi^l defined on uniform grids are recommended for their compact support, sparsity, and straightforward derivative calculation [(Poluektov et al., 2023), eqs. (25)-(27)].
  • Relaxation parameter: μ\mu should be chosen in (0,2)(0,2); empirically, μ1\mu \approx 1 achieves a favorable tradeoff between step size and noise filtering.
  • Initialization: The initial parameters H0H^0, G0G^0 are sampled uniformly from ranges that scale with the data output yminy_\mathrm{min}, ymaxy_\mathrm{max} and model size, ensuring internal states remain within the region of basis support [(Poluektov et al., 2023), eq. (30)].
  • Regularization and model tuning: Validation-based selection of grid sizes nn, ss and number of terms (typically $2m+1$ for full KA) mitigates overfitting.
  • Stopping criteria: Convergence can be monitored by update norms, Δ|\Delta|, or residuals.

6. Comparative Analysis

In direct comparisons on synthetic regression tasks, the NK method demonstrates superior robustness and efficiency vis-à-vis the Gauss-Newton method, especially under poor initial guesses. Each NK update involves only a subset of the parameters and does not require storing or manipulating large Jacobian matrices, significantly lowering computational and memory requirements.

While the referenced work does not include partial differential equation (PDE)-based benchmarks or direct comparisons with modern multilayer perceptrons (MLPs) on massive datasets, it documents the theoretical scalability and empirical efficiency of the approach for high-dimensional, large-sample nonlinear regression (Poluektov et al., 2023). The explicit focus on basis expansions and single-equation update steps distinguishes the NK algorithm from other nonlinear solvers deployed in machine learning and scientific computing.

7. Future Perspectives and Limitations

The paper by Poluektov & Polar does not address parallel or block implementations of the NK algorithm. Extension to parallel or distributed environments, such as asynchronous or block-Kaczmarz schemes, remains an open avenue, with anticipated complexities in synchronization and communication.

A plausible implication is that advances in this direction could further reduce wall-clock times for massive datasets, though these must be validated in practice. The algorithm's empirical performance in PDEs, extreme dimension settings, or with real-world structured noise awaits further demonstration, as such applications are explicitly marked as outside the scope of the current study (Poluektov et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Newton-Kaczmarz Algorithm.