Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sequential Line Search (SLS)

Updated 1 April 2026
  • Sequential Line Search (SLS) is an optimization technique that employs one-dimensional searches to update variables, ensuring efficient convergence in both deterministic and stochastic settings.
  • It is applied in blockwise optimization for group lasso and adaptive step-size selection in stochastic gradient descent, enhancing performance in high-dimensional, structured problems.
  • SLS leverages spectral decomposition and Armijo-type conditions to provide precise, computationally efficient updates that often outperform traditional coordinate descent methods.

Sequential Line Search (SLS) refers collectively to a class of optimization algorithms that utilize one-dimensional line searches, either deterministically or stochastically, as a core strategy for updating variables or parameter blocks within larger iterative schemes. SLS methods facilitate efficient, targeted updates by identifying optimal (or near-optimal) step sizes along chosen search directions, typically to minimize a nonsmooth or composite objective. These approaches are widely deployed in modern convex and stochastic optimization, particularly within penalized regression (e.g., group lasso) and stochastic gradient descent-type algorithms. Central theoretical and empirical results demonstrate that SLS variants can outperform traditional coordinate descent, blockwise descent with inexact searches, and global projection methods, especially in high-dimensional and structured statistical learning contexts (Foygel et al., 2010, Jiang et al., 2023).

1. SLS in Blockwise Optimization for Group Lasso

In the context of the group lasso—the regression problem where predictors are partitioned into groups and groupwise sparsity is induced using an 2\ell_2-norm penalty—SLS provides an exact blockwise optimization procedure. The classical group lasso estimator for GG groups is

minβRp  12yXβ22  +  λg=1Gwgβ(g)2,\min_{\beta\in\mathbb R^p}\;\frac12\|y - X\,\beta\|_2^2\;+\;\lambda\sum_{g=1}^G w_g\,\|\beta^{(g)}\|_2,

with XX partitioned into blocks X(g)X^{(g)}, and blockwise coefficients β(g)\beta^{(g)}. For each group gg, SLS solves the associated subproblem in α\alpha:

minαRpg  Q(α)=12rX(g)α22  +  λwgα2,\min_{\alpha\in\mathbb R^{p_g}}\;Q(\alpha) = \tfrac12\|\,r - X^{(g)}\,\alpha\|_2^2\;+\;\lambda\,w_g\,\|\alpha\|_2,

with rr the partial residual fixing other groups. By leveraging the spectral decomposition of the block Gram matrix and formulating a univariate equation in a radial variable GG0, the global minimum in the subspace can be computed via a one-dimensional line search, specifically solving

GG1

where GG2 are the eigenvalues of GG3 and GG4 is a rotated gradient vector. If GG5, the update is zero; otherwise, the unique root GG6 yields the blockwise update (Foygel et al., 2010).

2. Stochastic Line-Search SLS in SGD and Adaptive Algorithms

Within stochastic optimization, SLS strategies are employed to dynamically select step sizes in stochastic gradient methods. At each iteration, a minibatch GG7 is sampled, and a trial step GG8 is selected via backtracking line search to satisfy an Armijo-type condition on the sampled function:

GG9

with minβRp  12yXβ22  +  λg=1Gwgβ(g)2,\min_{\beta\in\mathbb R^p}\;\frac12\|y - X\,\beta\|_2^2\;+\;\lambda\sum_{g=1}^G w_g\,\|\beta^{(g)}\|_2,0 and a backtracking factor minβRp  12yXβ22  +  λg=1Gwgβ(g)2,\min_{\beta\in\mathbb R^p}\;\frac12\|y - X\,\beta\|_2^2\;+\;\lambda\sum_{g=1}^G w_g\,\|\beta^{(g)}\|_2,1 controlling the reduction in minβRp  12yXβ22  +  λg=1Gwgβ(g)2,\min_{\beta\in\mathbb R^p}\;\frac12\|y - X\,\beta\|_2^2\;+\;\lambda\sum_{g=1}^G w_g\,\|\beta^{(g)}\|_2,2 if the condition fails. The procedure enforces monotonic decrease or other safeguards as needed, and the iterate is updated with the found minβRp  12yXβ22  +  λg=1Gwgβ(g)2,\min_{\beta\in\mathbb R^p}\;\frac12\|y - X\,\beta\|_2^2\;+\;\lambda\sum_{g=1}^G w_g\,\|\beta^{(g)}\|_2,3 (Jiang et al., 2023).

Adaptive SLS variants such as AdaSLS further rescale accepted steps with an AdaGrad-norm-style denominator, facilitating parameter-free operation—no need for minβRp  12yXβ22  +  λg=1Gwgβ(g)2,\min_{\beta\in\mathbb R^p}\;\frac12\|y - X\,\beta\|_2^2\;+\;\lambda\sum_{g=1}^G w_g\,\|\beta^{(g)}\|_2,4-smoothness or strong convexity parameters. This yields theoretical minβRp  12yXβ22  +  λg=1Gwgβ(g)2,\min_{\beta\in\mathbb R^p}\;\frac12\|y - X\,\beta\|_2^2\;+\;\lambda\sum_{g=1}^G w_g\,\|\beta^{(g)}\|_2,5 convergence rates in convex interpolation regimes and robust behavior in noisy or non-interpolated settings.

3. Convergence Properties and Theoretical Guarantees

For blockwise SLS in group lasso, convergence is established via block-coordinate descent theory. Each sweep over all blocks produces a non-increasing objective, and the iterates converge to the unique solution under standard convexity assumptions. Finite-time error bounds are given in terms of current subgradient norms, and global optimality is ensured by strict convexity within blocks (Foygel et al., 2010).

In stochastic settings, SLS-type procedures admit the following behaviors under standard smoothness and convexity:

  • Interpolated, strongly convex: Linear convergence of minβRp  12yXβ22  +  λg=1Gwgβ(g)2,\min_{\beta\in\mathbb R^p}\;\frac12\|y - X\,\beta\|_2^2\;+\;\lambda\sum_{g=1}^G w_g\,\|\beta^{(g)}\|_2,6.
  • Interpolated, convex: minβRp  12yXβ22  +  λg=1Gwgβ(g)2,\min_{\beta\in\mathbb R^p}\;\frac12\|y - X\,\beta\|_2^2\;+\;\lambda\sum_{g=1}^G w_g\,\|\beta^{(g)}\|_2,7 convergence in minβRp  12yXβ22  +  λg=1Gwgβ(g)2,\min_{\beta\in\mathbb R^p}\;\frac12\|y - X\,\beta\|_2^2\;+\;\lambda\sum_{g=1}^G w_g\,\|\beta^{(g)}\|_2,8.
  • Non-interpolated (noisy): Converges to a neighborhood depending on the variance minβRp  12yXβ22  +  λg=1Gwgβ(g)2,\min_{\beta\in\mathbb R^p}\;\frac12\|y - X\,\beta\|_2^2\;+\;\lambda\sum_{g=1}^G w_g\,\|\beta^{(g)}\|_2,9. AdaSLS matches these rates while eliminating the need for tuning problem-dependent parameters (Jiang et al., 2023).

Variance-reduced SLS schemes (e.g., AdaSVRLS) achieve optimal XX0 gradient evaluation complexity via loopless SVRG-style estimators.

4. Extensions: Sparse Group Lasso and Signed SLS

The SLS methodology generalizes seamlessly to sparse group lasso, where both group-level (XX1) and elementwise (XX2) penalties are imposed:

XX3

In this context, the Signed Single Line Search (SSLS) algorithm is employed: the subproblem for each block XX4 requires identifying the active sign pattern and then solving a corresponding univariate equation conditional on the support. If the soft-thresholded block gradient norm is below XX5, the solution is zero; otherwise, a search over sign patterns and corresponding line searches produces the exact solution. Theoretical global convergence is established (Foygel et al., 2010).

5. Computational Considerations and Comparative Performance

Blockwise SLS for group lasso achieves per-group update cost XX6 after a one-time XX7 spectral decomposition, with per-sweep residual calculations scaling linearly in XX8. By comparison:

  • Inexact group-wise descent incurs higher line-search cost per block.
  • Coordinate descent may erroneously stall at non-unique blockwise solutions.
  • Gradient projection updates the entire parameter vector and may require expensive projections.
  • Active-set methods offer speed for extremely sparse solutions but have higher active-set update costs.

SLS and its signed variant SSLS yield substantial empirical speed-ups—orders of magnitude in simulations where group sizes are moderate and within-group correlations high—over alternative methods (Foygel et al., 2010). In stochastic optimization, the line-search overhead is minimal, adding XX9 function evaluations per iteration, while ensuring robust adaptation to unknown smoothness or variance structure (Jiang et al., 2023).

SLS Variant Application Domain Notable Properties/Results
Blockwise SLS (Sparse) Group Lasso Exact univariate update per block, global convergence
SSLS Sparse Group Lasso Signed pattern search, global convergence, robust sparsity
Stochastic SLS SGD and AdaSLS Armijo line search per minibatch, adaptive scaling
AdaSVRLS Variance-Reduced SGD Loopless SVRG, optimal X(g)X^{(g)}0

6. Practical Implementation and Empirical Observations

Empirical studies indicate that SLS methodologies deliver significant efficiency gains in high-dimensional penalized regression—particularly for groupwise-structured problems where groups are large or predictors are highly correlated within groups. SSLS enables practical sparse group lasso solutions for moderate group sizes (e.g., X(g)X^{(g)}1, X(g)X^{(g)}2) in seconds on standard hardware (Foygel et al., 2010). In stochastic optimization, AdaSLS self-tunes between constant and decaying step sizes, matching or outperforming classic adaptive methods (SPS, AdaGrad), while variance-reduced extensions attain optimal complexity-class guarantees (Jiang et al., 2023).

Practical considerations include the setup of line-search parameters, numerically stable computation of blockwise spectral decompositions, and low additional overhead due to efficient inner-loop line search procedures.

SLS serves as a foundational approach within both deterministic and stochastic optimization for structured regularization and smooth convex learning. Key variants span blockwise search in composite-regularized models, stochastic Armijo step selection in SGD, and AdaGrad-style adaptivity for learning rate schedules. All maintain guarantees of monotonic descent, global convergence, and computational tractability in relevant regimes (Foygel et al., 2010, Jiang et al., 2023).

Potential avenues for further development include scaling SSLS to large X(g)X^{(g)}3 via structured search strategies, extending adaptive SLS theory to nonconvex settings, and developing parallel and GPU-optimized implementations for large-scale machine learning tasks. A plausible implication is that SLS-based strategies can act as unifying schemes bridging deterministic blockwise and stochastic mini-batch optimization principles.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sequential Line Search (SLS).