Selective Partial Optimization (SPO)
- Selective Partial Optimization (SPO) is a block-decomposition method that updates only significant variable blocks to reduce computational cost while maintaining strong convergence.
- It employs adaptive partial linearization and selective Gauss–Newton techniques to tackle convex composite optimization and incremental estimations like SLAM.
- SPO dynamically adjusts active sets based on optimality gaps and measurement changes, achieving substantial reductions in cumulative computational operations.
Selective Partial Optimization (SPO) is a family of block-decomposition algorithms for large-scale separable optimization, aimed at reducing per-iteration computational cost while maintaining strong convergence guarantees. The essential principle is to update and relinearize only “significant” or “active” variable blocks at each iteration—those that exhibit substantial violation of optimality or that are most affected by new information. Applications span convex composite optimization, sparse learning, and incremental nonlinear estimation such as SLAM, where SPO enables scalable and accurate solutions by adaptively focusing computational effort.
1. Problem Settings and Mathematical Formulation
SPO addresses structured composite minimization and nonlinear least-squares problems over product spaces. Consider the canonical composite optimization problem as formulated in (Konnov, 2016): Let , with each nonempty, closed, and convex (often compact for simplicity). The objective is: where
- is continuously differentiable (not necessarily convex),
- are proper, convex, lower-semicontinuous, possibly nonsmooth, separable terms.
Block gradients are defined, with block-wise optimality gaps
A point is block-stationary if for all , equivalently satisfying the mixed variational inequality condition.
For incremental nonlinear optimization such as SLAM, the optimization is written as
with state vector , measurement models , and measurement covariance matrices (Arablouei, 13 Jan 2026).
2. Core Algorithms and Selective Update Mechanisms
The defining characteristic of SPO is the adaptive restriction of update and linearization steps to a subset of relevant variables.
Adaptive Partial Linearization (Konnov, 2016):
- At each Basic Cycle iteration, compute block gaps and select any block with (for a prescribed tolerance ).
- Solve the block partial minimization linearized subproblem for block ; other blocks remain fixed.
- Perform an inexact Armijo-type line search on this subspace direction.
- The outer loop decreases the tolerance geometrically, advancing stages only when all .
Selective Partial Gauss–Newton (SPO for SLAM, (Arablouei, 13 Jan 2026)):
- Partition variables into an active set (to be updated) and a static set (held fixed).
- At each GN iteration, solve the block-reduced normal equations only on .
- After each solve, prune by removing variables with small updates (); expand by including neighbors directly impacted by measurement changes.
- Relinearize only those measurements (edges) incident to the current .
- Terminate the GN loop when .
3. Theoretical Properties and Convergence Guarantees
The mathematical foundation of SPO relies on block-wise variational inequalities, classical block-coordinate descent principles, and convergence results for partial linearization.
Global Convergence (Konnov, 2016):
For product domain problems under standard regularity assumptions (convex-compactness of , gradient uniform continuity, convexity/l.s.c. of ), the sequence of stage points converges to a block-stationary point. With convexity, this limit is globally optimal.
Finite Basic-Cycle Termination:
In every outer stage (fixed ), only a finite number of block updates are needed before all block-gaps fall below tolerance.
Rate and Complexity:
For smooth convex with block-Lipschitz gradients, the number of block updates to reach satisfies
with the geometric decay rate for tolerances (Konnov, 2016).
Nonlinear Least-Squares Context (Arablouei, 13 Jan 2026):
Under a Lipschitz GN Hessian and positive-definite blocks, selective partial GN converges to the same stationary point as full GN, with local linear convergence and superlinear rates if the convergence threshold is driven to zero and residuals vanish at the solution.
4. Computational Cost and Efficiency Analysis
SPO algorithms significantly reduce per-iteration cost by confining expensive updates to a dynamically chosen subset of variables.
Block-Structured Partial Linearization (Konnov, 2016)
| Step | Full CG / Frank–Wolfe | Block-Coord. CG | SPO |
|---|---|---|---|
| Blocks updated/iter. | All | Cyclic/random single | Significant only |
| Block-gradients required | 1 | 1 (for selected) | |
| Gap reduction | Fast | Slower, uncontrolled | Adaptive, focused |
Per-iteration, SPO computes only one block-gradient and solves a low-dimensional subproblem. This enables reusing precomputed data and leveraging parallel structures, with the total block-update number scaling sublinearly with solution accuracy.
SLAM and Sparse GN Systems (Arablouei, 13 Jan 2026)
Let be the number of nonzeros in column of the Cholesky factor:
- Static-block Cholesky up/down-date:
- Partial solve:
If , the computational load is much less than full GN. Experiments demonstrate a 2–8× reduction in cumulative FLOPs at equivalent estimation accuracy (Arablouei, 13 Jan 2026).
5. Integration with Information-Guided Gating and Hybrid Schemes
In data-rich, incremental settings such as SLAM, SPO is often combined with information-theoretic gating mechanisms (IGG):
- The log-determinant of the information matrix () quantifies the information gain from new measurements.
- A detrended threshold determines whether to trigger a full global GN or a selective (local) GN update on a small .
- SPO then adaptively prunes and expands across GN iterations, focusing effort where residual change is highest.
- This approach preserves global consistency in the estimate graph and maintains convergence and accuracy guarantees while minimizing redundant computation (Arablouei, 13 Jan 2026).
6. Applications and Numerical Results
Convex Optimization (Konnov, 2016)
- Quadratic objectives with up to : Adaptive SPO required 30–50% as many block-gradient calls versus full CGM for comparable accuracy.
- Composite convex objectives, e.g., : SPO matched or exceeded the efficiency of block-coordinate methods.
- Practical use-cases: Group-LASSO, block-structured regression, network equilibrium with elastic demands (path-flow decomposition), and resource allocation problems.
Incremental SLAM (Arablouei, 13 Jan 2026)
- On standard 2D pose-graph datasets, SPO with IGG offers cumulative solve-FLOPs reduced by factors of 2–8 versus full incremental GN, with virtually identical trajectory error and normalized .
- Prune/expand rules ensure that only variables with significant GN increments and their direct neighbors are updated and relinearized, exploiting locality in graph structure.
- Adjustable thresholds () allow trade-off between computation and estimation accuracy.
7. Relation to Other Methods and Variants
SPO generalizes and strictly improves upon several existing block-decomposition and coordinate-update techniques:
- Full Conditional Gradient (Frank–Wolfe): Each step involves all blocks and full gradients—a computational bottleneck for large-scale problems.
- Coordinate or Block-Coordinate CG: Updates a single block randomly or in a fixed sequence without adaptivity to optimality gap sizes, lacking tolerance control and often requiring more iterations for convergence.
- SPO: Uniquely focuses on blocks with meaningful optimality (or residual) violations, dynamically adjusts the active set, and enables line-search-free operation when block-Lipschitz constants are available. It subsumes block-coordinate approaches as special cases, with significantly better empirical performance on large-scale, sparse, or graph-structured problems.
In summary, Selective Partial Optimization is a theoretically grounded, computationally efficient strategy for solving large separable optimization and estimation problems by focusing resources on the most critical blocks or variables at each iteration, enabling scalability and accuracy across multiple domains (Konnov, 2016, Arablouei, 13 Jan 2026).