Local Linearization Projection

Updated 29 November 2025

Local linearization-based projection approximates nonlinear objects by using first-order Taylor expansions to simplify complex projections.
It is applied in nonconvex optimization, constrained feasibility, and manifold methods, ensuring efficient computation and reliable convergence.
The method underpins statistical learning and visualization techniques, achieving practical accuracy and scalability in high-dimensional settings.

Local linearization-based projection refers to a class of methods that approximate geometric objects (sets, functions, manifolds) or algorithmic steps (projection, gradient, inference) by their first-order (linear or affine) Taylor expansions at or near a current iterate, for the purpose of efficient computation or improved tractability. This paradigm is foundational across projection algorithms for nonconvex feasibility, optimization with nonlinear constraints, modern manifold methods in numerical analysis, local smoothing in nonparametric statistics, and local structure-preserving dimensionality-reduction. The approach replaces computationally intractable or nonlinear projection operations with projections onto locally linearized or affine approximations, leading to sharply improved efficiency while preserving attractive convergence or estimation properties in the local regime.

1. General Mathematical Framework

Let $X$ be a finite-dimensional Euclidean space, and $M\subseteq X$ a set defined by nonlinear constraints, a nonlinear manifold, or a nonlinear mapping $F:X\to Y$ . The standard nearest-point projection onto $M$ is, in general, nonconvex and computationally hard to compute: $P_M(z) = \operatorname*{argmin}_{x\in M} \|x-z\|\,.$ Local linearization-based projection approximates $M$ near a nominal point $z$ by its first-order Taylor expansion:

If $M = \{x: G(x) \le 0,\, H(x) = 0\}$ with $C^2$ data, linearize active constraints at $z$ :

$G(z) + \nabla G(z)(x-z) \le 0,\quad H(z) + \nabla H(z)(x-z) = 0\,.$

For $M = \{F(x): x\in U\}$ , linearize the chart: $F(x) \approx F(z) + \nabla F(z)(x-z)$ .

The local projection is then defined by solving a convex quadratic program (QP) or least-squares, imposing only the linearized constraints or manifold tangency: $\Phi(z) = \operatorname*{argmin}_{x\in X} \frac{1}{2} \|x-z\|^2 \;\text{s.t.}\; \text{linearized constraints at } z.$ This inexact projection operator $\Phi$ admits second-order proximity to $P_M(z)$ as $z\to \bar x$ under standard regularity (e.g., linear-independence constraint qualification) (Drusvyatskiy et al., 2018).

2. Alternating Projections with Local Linearization

Alternating projections seek a point in $Q\cap M$ for closed sets $Q,M\subset X$ by iteratively projecting onto each set. If $M$ is nonconvex, exact $P_M$ is generally impractical. Linearization-based projection circumvents this by using $\Phi$ from the previous section as a surrogate. The canonical scheme is:

Start $z^0\in Q$ near $\bar x\in Q\cap M$ .
For $k=0,1,\dots$ $k = 0, 1, \dots$ until convergence:
- $x^{k+1} \gets \Phi(z^k)$ (local linearized projection toward $M$ )
- $z^{k+1} \gets P_Q(x^{k+1})$ .

Under prox-regularity of $Q$ , smoothness and LICQ for $M$ , and transversality $N_Q(\bar x)\cap(-N_M(\bar x)) = \{0\}$ , this algorithm converges linearly locally to a point in $Q\cap M$ , with rate controlled by the cosine of the minimal angle between the normal cones (Drusvyatskiy et al., 2018). This is robust to the inexactness of using linearized projections, as the error $\|\Phi(z)-P_M(z)\| = O(d_M(z)^2)$ vanishes faster than the linear contraction.

In the special case where $M$ is a smooth manifold parameterized by a chart $F$ , the local tangent-space projection is realized by a least-squares step, optionally followed by a retraction back onto $M$ .

3. Projected Gradient Methods with Constraint Linearization

When addressing nonlinear constrained optimization, e.g., minimizing $J(z)$ subject to $g(z)\leq 0,\, h(z)=0$ , projections onto the true feasible set are nonconvex and expensive. The constraint-linearization method projects the updated iterate only onto the affine-linearized constraints at the current point. Specifically, at each step:

Form the linearized constraint set

$C^{(i)} = \{z\in\mathbb{R}^n: g(z^{(i)}) + \nabla g(z^{(i)})^T (z-z^{(i)}) \le 0,\, h(z^{(i)}) + \nabla h(z^{(i)})^T (z-z^{(i)}) = 0\}.$

Take a projected gradient step, projecting onto $C^{(i)}$ , not the nonlinear feasible set:

$z_{G}^{(i)} = \mathrm{Proj}_{C^{(i)}}(z^{(i)} - \alpha_i \nabla J(z^{(i)})).$

Update via line search if necessary.

This is not classical projected gradient descent (since projection is onto a temporally local affine set) nor full SQP (as second-order information is omitted). Under regularity, the method converges globally to a KKT point, and locally linearly near a solution (Torrisi et al., 2016). For nonlinear model predictive control (NMPC), exploiting problem sparsity and introducing slacks for box constraints yield highly efficient implementations.

4. Local Linearization in Statistical and Machine Learning Methods

Local linearization-based projection underpins several approaches in statistics and machine learning:

In additive nonparametric regression, local linear smooth backfitting is recast as orthogonal projection of the response vector $Y$ onto the additive subspace in a Hilbert space with empirical semi-norm (Hiabu et al., 2022). Each iteration alternates local projections onto component function spaces, with convergence rates matching the oracle case— $O_P(n^{-2/5})$ under standard assumptions.
For function learning, the "linearization ML" paradigm projects the data onto a globally linear (affine) space via $y'_i = W^\top X_i$ , then performs prediction by local consensus among $k$ nearest neighbors in the 1D output space of this linear projection. This two-phase process can outperform both MLP and logistic regression on some LIBSVM datasets (Tueno, 2019). It differs from classical local linear regression by using a single global projection and only local adaptation in predictor space.

In Bayesian neural nets, the generalized Gauss-Newton (GGN) approximation is formalized as a local linearization in parameter space: $f(x,\theta) \approx f(x,\theta^*) + J(x;\theta^*)(\theta-\theta^*)\,,$ with $\theta^*$ the MAP point and $J$ the Jacobian. Posterior inference proceeds in the resulting Bayesian GLM, and predictive uncertainty is propagated through this linearization, which stabilizes predictions and enhances out-of-distribution detection compared to naive nonlinear parameter sampling (Immer et al., 2020).

5. Applications in Multidimensional Projection and Visualization

For dimensionality reduction (DR) and data visualization, local linearization-based projection is instrumental in understanding and mapping the deformation of high-dimensional local subspaces under possibly nonlinear projections:

Define local subspaces at each sample as ellipsoids from PCA of the $k$ -nearest neighbors.
The projection $\pi: \mathbb{R}^D \to \mathbb{R}^d$ is often defined implicitly as the solution to a local nonlinear optimization.
The Jacobian $J(x) = \partial \pi/\partial x$ is computed analytically via the implicit function theorem, exploiting

$J(x) = -[\partial^2 f/\partial y^2]^{-1}[\partial^2 f/\partial x \partial y]\,.$

Local subspace basis directions are then mapped via $v_i = J(x) V_i$ , producing a visualization glyph that encodes subspace stretching or rotation (Bian et al., 2020).

Empirical results demonstrate that this approach achieves high numerical accuracy (mean angular error of $0.005^\circ$ on synthetic planar data), and its glyph-based visualization reveals subtle global and local data structures unobservable in standard scatterplots.

6. Connections to Nonlinear Boundary Value Problems and Numerical Analysis

Local linearization-based projection generalizes to iterative methods for nonlinear boundary value problems (BVPs). For two-point BVPs, the shooting-projection iteration (SPI) method reformulates standard shooting by:

Given a shooting trajectory $y_k$ , construct a "projection" $y_{k+1}$ as the solution to a linearized BVP

$y_{k+1}'(x) = L_k(x)[y_{k+1}(x) - y_k(x)] + f(x, y_k(x)),\quad y_{k+1}(a) = \alpha,\; y_{k+1}(b) = \beta\,,$

with $L_k(x)$ determined via Newton, Picard, or constant-slope linearizations.

The procedure is a projection in function space onto the affine subspace satisfying the two boundary conditions, and yields the familiar shooting method updates (including Newton and fixed-point shooting) (Faragó et al., 2020).

Convergence rates are quadratic (Newton), or linear (Picard/constant-slope), reflecting their underlying linearization properties. The projection perspective offers a unifying explanation for the convergence and error-correction mechanisms of shooting and relaxation methods.

7. Theoretical Properties and Computational Aspects

The following table summarizes theoretical properties and complexity considerations across representative contexts:

Domain	Linearized Projection Object	Convergence Rate
Feasibility (AP)	Polyhedral/affine set (QP/LS step)	Local linear, rate $\approx \cos\alpha$ (Drusvyatskiy et al., 2018)
Constrained Opt. (NLP)	Affine-linear constraint set	Local linear + global (w. AL) (Torrisi et al., 2016)
Nonparametric stat.	Additive subspace (semi-norm)	Optimal (oracle) $O_P(n^{-2/5})$ (Hiabu et al., 2022)
Dim. reduction (viz)	Local subspace, Jacobian map	Two orders magnitude more accurate glyphs (Bian et al., 2020)
Shooting for BVP	Linearized BVP operator	Quadratic (Newton), linear (Picard) (Faragó et al., 2020)

Computational complexity is often dominated by small-scale QP or least-squares solves per iteration, leveraging only first derivatives. Line search or augmented Lagrangian terms ensure robustness in nonconvex contexts. In statistical applications, the block structure of projection operators facilitates scalable implementation.

Local linearization-based projection unifies a broad spectrum of algorithms in optimization, numerical analysis, statistical learning, and high-dimensional data analysis. By systematically replacing nonlinear or nonconvex projection operations with tractable linear or affine surrogates, these methods offer both practical efficiency and strong theoretical guarantees under local regularity and transversality conditions. Empirical and mathematical results across multiple domains confirm the versatility and foundational role of this approach (Drusvyatskiy et al., 2018, Torrisi et al., 2016, Tueno, 2019, Hiabu et al., 2022, Bian et al., 2020, Immer et al., 2020, Faragó et al., 2020).