Papers
Topics
Authors
Recent
Search
2000 character limit reached

Selective Partial Optimization (SPO)

Updated 20 January 2026
  • Selective Partial Optimization (SPO) is a block-decomposition method that updates only significant variable blocks to reduce computational cost while maintaining strong convergence.
  • It employs adaptive partial linearization and selective Gauss–Newton techniques to tackle convex composite optimization and incremental estimations like SLAM.
  • SPO dynamically adjusts active sets based on optimality gaps and measurement changes, achieving substantial reductions in cumulative computational operations.

Selective Partial Optimization (SPO) is a family of block-decomposition algorithms for large-scale separable optimization, aimed at reducing per-iteration computational cost while maintaining strong convergence guarantees. The essential principle is to update and relinearize only “significant” or “active” variable blocks at each iteration—those that exhibit substantial violation of optimality or that are most affected by new information. Applications span convex composite optimization, sparse learning, and incremental nonlinear estimation such as SLAM, where SPO enables scalable and accurate solutions by adaptively focusing computational effort.

1. Problem Settings and Mathematical Formulation

SPO addresses structured composite minimization and nonlinear least-squares problems over product spaces. Consider the canonical composite optimization problem as formulated in (Konnov, 2016): Let X=X1××XnX = X_1 \times \cdots \times X_n, with each XiRniX_i \subset \mathbb{R}^{n_i} nonempty, closed, and convex (often compact for simplicity). The objective is: minxX  F(x)=f(x)+i=1nhi(xi)\min_{x \in X} \; F(x) = f(x) + \sum_{i=1}^n h_i(x_i) where

  • f:RNRf : \mathbb{R}^N \rightarrow \mathbb{R} is continuously differentiable (not necessarily convex),
  • hi:RniR{+}h_i : \mathbb{R}^{n_i} \rightarrow \mathbb{R} \cup\{+\infty\} are proper, convex, lower-semicontinuous, possibly nonsmooth, separable terms.

Block gradients g(x)=f(x)=(g1(x),,gn(x))g(x) = \nabla f(x) = (g_1(x),\dots,g_n(x)) are defined, with block-wise optimality gaps

pi(x)=maxyiXi{gi(x),xiyi+hi(xi)hi(yi)}p_i(x) = \max_{y_i \in X_i} \left\{ \langle g_i(x), x_i - y_i \rangle + h_i(x_i) - h_i(y_i) \right\}

A point xx^* is block-stationary if pi(x)=0p_i(x^*) = 0 for all ii, equivalently satisfying the mixed variational inequality condition.

For incremental nonlinear optimization such as SLAM, the optimization is written as

minxRN  c(x)=12j=1Mmjfj(xVj)Σj2\min_{\mathbf{x}\in \mathbb{R}^N} \; c(\mathbf{x}) = \frac{1}{2}\sum_{j=1}^M \|\mathbf{m}_j - \mathbf{f}_j(\mathbf{x}_{\mathcal{V}_j})\|_{\boldsymbol\Sigma_j}^2

with state vector x\mathbf{x}, measurement models fj\mathbf{f}_j, and measurement covariance matrices Σj\boldsymbol\Sigma_j (Arablouei, 13 Jan 2026).

2. Core Algorithms and Selective Update Mechanisms

The defining characteristic of SPO is the adaptive restriction of update and linearization steps to a subset of relevant variables.

Adaptive Partial Linearization (Konnov, 2016):

  • At each Basic Cycle iteration, compute block gaps pi(x)p_i(x) and select any block ss with ps(x)δp_s(x) \geq \delta (for a prescribed tolerance δ\delta).
  • Solve the block partial minimization linearized subproblem for block ss; other blocks remain fixed.
  • Perform an inexact Armijo-type line search on this subspace direction.
  • The outer loop decreases the tolerance δ0\delta_{\ell} \to 0 geometrically, advancing stages only when all pi(x)<δp_i(x) < \delta_{\ell}.

Selective Partial Gauss–Newton (SPO for SLAM, (Arablouei, 13 Jan 2026)):

  • Partition variables into an active set S\mathcal{S} (to be updated) and a static set U\mathcal{U} (held fixed).
  • At each GN iteration, solve the block-reduced normal equations only on S\mathcal{S}.
  • After each solve, prune S\mathcal{S} by removing variables with small updates (diτd|d_i| \leq \tau_d); expand S\mathcal{S} by including neighbors directly impacted by measurement changes.
  • Relinearize only those measurements (edges) incident to the current S\mathcal{S}.
  • Terminate the GN loop when S=\mathcal{S} = \emptyset.

3. Theoretical Properties and Convergence Guarantees

The mathematical foundation of SPO relies on block-wise variational inequalities, classical block-coordinate descent principles, and convergence results for partial linearization.

Global Convergence (Konnov, 2016):

For product domain problems under standard regularity assumptions (convex-compactness of XiX_i, gradient uniform continuity, convexity/l.s.c. of hih_i), the sequence of stage points converges to a block-stationary point. With convexity, this limit is globally optimal.

Finite Basic-Cycle Termination:

In every outer stage (fixed δ\delta), only a finite number of block updates are needed before all block-gaps fall below tolerance.

Rate and Complexity:

For smooth convex ff with block-Lipschitz gradients, the number of block updates V(ϵ)V(\epsilon) to reach F(x)FϵF(x) - F^* \leq \epsilon satisfies

V(ϵ)CΔ/ϵ11vV(\epsilon) \leq C \cdot \frac{\Delta/\epsilon - 1}{1-v}

with v(0,1)v \in (0,1) the geometric decay rate for tolerances (Konnov, 2016).

Nonlinear Least-Squares Context (Arablouei, 13 Jan 2026):

Under a Lipschitz GN Hessian and positive-definite blocks, selective partial GN converges to the same stationary point as full GN, with local linear convergence and superlinear rates if the convergence threshold is driven to zero and residuals vanish at the solution.

4. Computational Cost and Efficiency Analysis

SPO algorithms significantly reduce per-iteration cost by confining expensive updates to a dynamically chosen subset of variables.

Step Full CG / Frank–Wolfe Block-Coord. CG SPO
Blocks updated/iter. All Cyclic/random single Significant only
Block-gradients required O(n)O(n) 1 1 (for selected)
Gap reduction Fast Slower, uncontrolled Adaptive, focused

Per-iteration, SPO computes only one block-gradient and solves a low-dimensional subproblem. This enables reusing precomputed data and leveraging parallel structures, with the total block-update number scaling sublinearly with solution accuracy.

Let κt,i\kappa_{t,i} be the number of nonzeros in column ii of the Cholesky factor:

  • Static-block Cholesky up/down-date: min(2iStκt,i2,i=1Nκt,i2)\min(2 \sum_{i\in\mathcal{S}_t}\kappa_{t,i}^2, \sum_{i=1}^N\kappa_{t,i}^2)
  • Partial solve: 2iStκt,i2\sum_{i\in\mathcal{S}_t}\kappa_{t,i}

If StN|\mathcal{S}_t| \ll N, the computational load is much less than full GN. Experiments demonstrate a 2–8× reduction in cumulative FLOPs at equivalent estimation accuracy (Arablouei, 13 Jan 2026).

5. Integration with Information-Guided Gating and Hybrid Schemes

In data-rich, incremental settings such as SLAM, SPO is often combined with information-theoretic gating mechanisms (IGG):

  • The log-determinant of the information matrix (lndet(JtTJt)\ln\det(\mathbf{J}_t^\mathsf{T}\mathbf{J}_t)) quantifies the information gain from new measurements.
  • A detrended threshold Δηt\Delta\eta_t determines whether to trigger a full global GN or a selective (local) GN update on a small St\mathcal{S}_t.
  • SPO then adaptively prunes and expands St\mathcal{S}_t across GN iterations, focusing effort where residual change is highest.
  • This approach preserves global consistency in the estimate graph and maintains convergence and accuracy guarantees while minimizing redundant computation (Arablouei, 13 Jan 2026).

6. Applications and Numerical Results

Convex Optimization (Konnov, 2016)

  • Quadratic objectives with up to N=1500N = 1500: Adaptive SPO required 30–50% as many block-gradient calls versus full CGM for comparable accuracy.
  • Composite convex objectives, e.g., F(x)=f1(x)+ci/(ciTx+T)F(x) = f_1(x) + \sum c_i/(c_i^Tx + T): SPO matched or exceeded the efficiency of block-coordinate methods.
  • Practical use-cases: Group-LASSO, block-structured regression, network equilibrium with elastic demands (path-flow decomposition), and resource allocation problems.

Incremental SLAM (Arablouei, 13 Jan 2026)

  • On standard 2D pose-graph datasets, SPO with IGG offers cumulative solve-FLOPs reduced by factors of 2–8 versus full incremental GN, with virtually identical trajectory error and normalized χ2\chi^2.
  • Prune/expand rules ensure that only variables with significant GN increments and their direct neighbors are updated and relinearized, exploiting locality in graph structure.
  • Adjustable thresholds (τd,τη\tau_d, \tau_\eta) allow trade-off between computation and estimation accuracy.

7. Relation to Other Methods and Variants

SPO generalizes and strictly improves upon several existing block-decomposition and coordinate-update techniques:

  • Full Conditional Gradient (Frank–Wolfe): Each step involves all blocks and full gradients—a computational bottleneck for large-scale problems.
  • Coordinate or Block-Coordinate CG: Updates a single block randomly or in a fixed sequence without adaptivity to optimality gap sizes, lacking tolerance control and often requiring more iterations for convergence.
  • SPO: Uniquely focuses on blocks with meaningful optimality (or residual) violations, dynamically adjusts the active set, and enables line-search-free operation when block-Lipschitz constants are available. It subsumes block-coordinate approaches as special cases, with significantly better empirical performance on large-scale, sparse, or graph-structured problems.

In summary, Selective Partial Optimization is a theoretically grounded, computationally efficient strategy for solving large separable optimization and estimation problems by focusing resources on the most critical blocks or variables at each iteration, enabling scalability and accuracy across multiple domains (Konnov, 2016, Arablouei, 13 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Selective Partial Optimization (SPO).