Sequential Line Search (SLS)
- Sequential Line Search (SLS) is an optimization technique that employs one-dimensional searches to update variables, ensuring efficient convergence in both deterministic and stochastic settings.
- It is applied in blockwise optimization for group lasso and adaptive step-size selection in stochastic gradient descent, enhancing performance in high-dimensional, structured problems.
- SLS leverages spectral decomposition and Armijo-type conditions to provide precise, computationally efficient updates that often outperform traditional coordinate descent methods.
Sequential Line Search (SLS) refers collectively to a class of optimization algorithms that utilize one-dimensional line searches, either deterministically or stochastically, as a core strategy for updating variables or parameter blocks within larger iterative schemes. SLS methods facilitate efficient, targeted updates by identifying optimal (or near-optimal) step sizes along chosen search directions, typically to minimize a nonsmooth or composite objective. These approaches are widely deployed in modern convex and stochastic optimization, particularly within penalized regression (e.g., group lasso) and stochastic gradient descent-type algorithms. Central theoretical and empirical results demonstrate that SLS variants can outperform traditional coordinate descent, blockwise descent with inexact searches, and global projection methods, especially in high-dimensional and structured statistical learning contexts (Foygel et al., 2010, Jiang et al., 2023).
1. SLS in Blockwise Optimization for Group Lasso
In the context of the group lasso—the regression problem where predictors are partitioned into groups and groupwise sparsity is induced using an -norm penalty—SLS provides an exact blockwise optimization procedure. The classical group lasso estimator for groups is
with partitioned into blocks , and blockwise coefficients . For each group , SLS solves the associated subproblem in :
with the partial residual fixing other groups. By leveraging the spectral decomposition of the block Gram matrix and formulating a univariate equation in a radial variable 0, the global minimum in the subspace can be computed via a one-dimensional line search, specifically solving
1
where 2 are the eigenvalues of 3 and 4 is a rotated gradient vector. If 5, the update is zero; otherwise, the unique root 6 yields the blockwise update (Foygel et al., 2010).
2. Stochastic Line-Search SLS in SGD and Adaptive Algorithms
Within stochastic optimization, SLS strategies are employed to dynamically select step sizes in stochastic gradient methods. At each iteration, a minibatch 7 is sampled, and a trial step 8 is selected via backtracking line search to satisfy an Armijo-type condition on the sampled function:
9
with 0 and a backtracking factor 1 controlling the reduction in 2 if the condition fails. The procedure enforces monotonic decrease or other safeguards as needed, and the iterate is updated with the found 3 (Jiang et al., 2023).
Adaptive SLS variants such as AdaSLS further rescale accepted steps with an AdaGrad-norm-style denominator, facilitating parameter-free operation—no need for 4-smoothness or strong convexity parameters. This yields theoretical 5 convergence rates in convex interpolation regimes and robust behavior in noisy or non-interpolated settings.
3. Convergence Properties and Theoretical Guarantees
For blockwise SLS in group lasso, convergence is established via block-coordinate descent theory. Each sweep over all blocks produces a non-increasing objective, and the iterates converge to the unique solution under standard convexity assumptions. Finite-time error bounds are given in terms of current subgradient norms, and global optimality is ensured by strict convexity within blocks (Foygel et al., 2010).
In stochastic settings, SLS-type procedures admit the following behaviors under standard smoothness and convexity:
- Interpolated, strongly convex: Linear convergence of 6.
- Interpolated, convex: 7 convergence in 8.
- Non-interpolated (noisy): Converges to a neighborhood depending on the variance 9. AdaSLS matches these rates while eliminating the need for tuning problem-dependent parameters (Jiang et al., 2023).
Variance-reduced SLS schemes (e.g., AdaSVRLS) achieve optimal 0 gradient evaluation complexity via loopless SVRG-style estimators.
4. Extensions: Sparse Group Lasso and Signed SLS
The SLS methodology generalizes seamlessly to sparse group lasso, where both group-level (1) and elementwise (2) penalties are imposed:
3
In this context, the Signed Single Line Search (SSLS) algorithm is employed: the subproblem for each block 4 requires identifying the active sign pattern and then solving a corresponding univariate equation conditional on the support. If the soft-thresholded block gradient norm is below 5, the solution is zero; otherwise, a search over sign patterns and corresponding line searches produces the exact solution. Theoretical global convergence is established (Foygel et al., 2010).
5. Computational Considerations and Comparative Performance
Blockwise SLS for group lasso achieves per-group update cost 6 after a one-time 7 spectral decomposition, with per-sweep residual calculations scaling linearly in 8. By comparison:
- Inexact group-wise descent incurs higher line-search cost per block.
- Coordinate descent may erroneously stall at non-unique blockwise solutions.
- Gradient projection updates the entire parameter vector and may require expensive projections.
- Active-set methods offer speed for extremely sparse solutions but have higher active-set update costs.
SLS and its signed variant SSLS yield substantial empirical speed-ups—orders of magnitude in simulations where group sizes are moderate and within-group correlations high—over alternative methods (Foygel et al., 2010). In stochastic optimization, the line-search overhead is minimal, adding 9 function evaluations per iteration, while ensuring robust adaptation to unknown smoothness or variance structure (Jiang et al., 2023).
| SLS Variant | Application Domain | Notable Properties/Results |
|---|---|---|
| Blockwise SLS | (Sparse) Group Lasso | Exact univariate update per block, global convergence |
| SSLS | Sparse Group Lasso | Signed pattern search, global convergence, robust sparsity |
| Stochastic SLS | SGD and AdaSLS | Armijo line search per minibatch, adaptive scaling |
| AdaSVRLS | Variance-Reduced SGD | Loopless SVRG, optimal 0 |
6. Practical Implementation and Empirical Observations
Empirical studies indicate that SLS methodologies deliver significant efficiency gains in high-dimensional penalized regression—particularly for groupwise-structured problems where groups are large or predictors are highly correlated within groups. SSLS enables practical sparse group lasso solutions for moderate group sizes (e.g., 1, 2) in seconds on standard hardware (Foygel et al., 2010). In stochastic optimization, AdaSLS self-tunes between constant and decaying step sizes, matching or outperforming classic adaptive methods (SPS, AdaGrad), while variance-reduced extensions attain optimal complexity-class guarantees (Jiang et al., 2023).
Practical considerations include the setup of line-search parameters, numerically stable computation of blockwise spectral decompositions, and low additional overhead due to efficient inner-loop line search procedures.
7. Broader Impact, Variants, and Related Methods
SLS serves as a foundational approach within both deterministic and stochastic optimization for structured regularization and smooth convex learning. Key variants span blockwise search in composite-regularized models, stochastic Armijo step selection in SGD, and AdaGrad-style adaptivity for learning rate schedules. All maintain guarantees of monotonic descent, global convergence, and computational tractability in relevant regimes (Foygel et al., 2010, Jiang et al., 2023).
Potential avenues for further development include scaling SSLS to large 3 via structured search strategies, extending adaptive SLS theory to nonconvex settings, and developing parallel and GPU-optimized implementations for large-scale machine learning tasks. A plausible implication is that SLS-based strategies can act as unifying schemes bridging deterministic blockwise and stochastic mini-batch optimization principles.