S-HLSP: Hierarchical Least-Squares Optimization
- S-HLSP is a structured optimization framework that solves lexicographic multi-level least-squares programs by hierarchically prioritizing objectives and constraints.
- It leverages advanced techniques such as hierarchical Newton/Gauss–Newton linearizations, adaptive step filters, and nullspace-based solvers to attain global convergence and computational efficiency.
- Its applications span robot control, trajectory optimization, and non-linear optimal control, where maintaining strict priority ordering is critical for performance.
Sequential Hierarchical Least-Squares Programming (S-HLSP) is a structured optimization framework for solving lexicographic (prioritized) multi-level least-squares programs, especially tailored for applications requiring strict prioritization of objectives and constraints, such as robot control, trajectory optimization, and prioritized non-linear optimal control. S-HLSP operates by stacking multiple least-squares subproblems in a strict hierarchy, ensuring that lower-priority objectives are optimized over the consensual set of (in)feasible solutions of all higher priorities. Advanced algorithmic components—including hierarchical Newton/Gauss–Newton linearizations, adaptive step filters, and efficient nullspace-based interior-point or ADMM solvers—enable S-HLSP to achieve both global convergence and high computational efficiency in large-scale and structured instances (Pfeiffer et al., 2024, Pfeiffer et al., 2023).
1. Formal Problem Statement and Lexicographic Structure
S-HLSP addresses hierarchical least-squares programs where levels of quadratic objectives are stacked lexicographically: $\begin{aligned} &\lexmin_{x,\,v_1,\ldots, v_p} \bigg( \tfrac{1}{2}\|v_{\mathbb{E}_1}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_1}\|^2,\,\ldots,\, \tfrac{1}{2}\|v_{\mathbb{E}_p}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_p}\|^2 \bigg) \ &\text{subject to: } \begin{cases} f_{\mathbb{E}_l}(x) = v_{\mathbb{E}_l},\ f_{\mathbb{I}_l}(x) \leq v_{\mathbb{I}_l},\ f_{\cup\,l-1}(x) = v_{\cup\,l-1}^*,\ f_{\cup\,l-1}(x) \leq 0, \end{cases} \quad l=1 \ldots p. \end{aligned}$ At each hierarchy level , the solution preserves (within strict feasibility or optimal infeasibility) the best possible value of the upper priorities, and seeks a least-squares minimization on the new slack variables . This structure is critical for tasks such as task-prioritized motion generation in robotics or sequential convexification in non-linear optimal control (Pfeiffer et al., 2024, Pfeiffer et al., 2023).
Linearizations are performed at each sequential outer iteration using hierarchical Newton or Gauss–Newton steps, reducing the original non-linear problem to a series of hierarchical linear least-squares subproblems (HLSP) on the step .
2. Algorithmic Framework: Sequential HLSP and Step-Filters
Each S-HLSP iteration proceeds as follows:
- Outer iteration: Linearize the non-linear HLSP (NL-HLSP) about the current .
- Subproblem formation: Formulate a linear, lexicographic HLSP on variables , with blockwise structure reflecting the task prioritization.
- Hierarchical step-filter (HSF): For each priority , maintain a filter consisting of pairs . A new candidate step is accepted if it improves feasibility or optimality according to specific filter criteria:
$\begin{aligned} &\lexmin_{x,\,v_1,\ldots, v_p} \bigg( \tfrac{1}{2}\|v_{\mathbb{E}_1}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_1}\|^2,\,\ldots,\, \tfrac{1}{2}\|v_{\mathbb{E}_p}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_p}\|^2 \bigg) \ &\text{subject to: } \begin{cases} f_{\mathbb{E}_l}(x) = v_{\mathbb{E}_l},\ f_{\mathbb{I}_l}(x) \leq v_{\mathbb{I}_l},\ f_{\cup\,l-1}(x) = v_{\cup\,l-1}^*,\ f_{\cup\,l-1}(x) \leq 0, \end{cases} \quad l=1 \ldots p. \end{aligned}$0
with $\begin{aligned} &\lexmin_{x,\,v_1,\ldots, v_p} \bigg( \tfrac{1}{2}\|v_{\mathbb{E}_1}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_1}\|^2,\,\ldots,\, \tfrac{1}{2}\|v_{\mathbb{E}_p}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_p}\|^2 \bigg) \ &\text{subject to: } \begin{cases} f_{\mathbb{E}_l}(x) = v_{\mathbb{E}_l},\ f_{\mathbb{I}_l}(x) \leq v_{\mathbb{I}_l},\ f_{\cup\,l-1}(x) = v_{\cup\,l-1}^*,\ f_{\cup\,l-1}(x) \leq 0, \end{cases} \quad l=1 \ldots p. \end{aligned}$1.
- Trust region adaptation: The trust region radius $\begin{aligned} &\lexmin_{x,\,v_1,\ldots, v_p} \bigg( \tfrac{1}{2}\|v_{\mathbb{E}_1}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_1}\|^2,\,\ldots,\, \tfrac{1}{2}\|v_{\mathbb{E}_p}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_p}\|^2 \bigg) \ &\text{subject to: } \begin{cases} f_{\mathbb{E}_l}(x) = v_{\mathbb{E}_l},\ f_{\mathbb{I}_l}(x) \leq v_{\mathbb{I}_l},\ f_{\cup\,l-1}(x) = v_{\cup\,l-1}^*,\ f_{\cup\,l-1}(x) \leq 0, \end{cases} \quad l=1 \ldots p. \end{aligned}$2 at each level is dynamically adjusted according to step acceptance, balancing global convergence guarantees and efficient progress (Pfeiffer et al., 2023).
High-Level Algorithm (paraphrased from (Pfeiffer et al., 2023))
- Initialize $\begin{aligned} &\lexmin_{x,\,v_1,\ldots, v_p} \bigg( \tfrac{1}{2}\|v_{\mathbb{E}_1}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_1}\|^2,\,\ldots,\, \tfrac{1}{2}\|v_{\mathbb{E}_p}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_p}\|^2 \bigg) \ &\text{subject to: } \begin{cases} f_{\mathbb{E}_l}(x) = v_{\mathbb{E}_l},\ f_{\mathbb{I}_l}(x) \leq v_{\mathbb{I}_l},\ f_{\cup\,l-1}(x) = v_{\cup\,l-1}^*,\ f_{\cup\,l-1}(x) \leq 0, \end{cases} \quad l=1 \ldots p. \end{aligned}$3, large initial $\begin{aligned} &\lexmin_{x,\,v_1,\ldots, v_p} \bigg( \tfrac{1}{2}\|v_{\mathbb{E}_1}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_1}\|^2,\,\ldots,\, \tfrac{1}{2}\|v_{\mathbb{E}_p}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_p}\|^2 \bigg) \ &\text{subject to: } \begin{cases} f_{\mathbb{E}_l}(x) = v_{\mathbb{E}_l},\ f_{\mathbb{I}_l}(x) \leq v_{\mathbb{I}_l},\ f_{\cup\,l-1}(x) = v_{\cup\,l-1}^*,\ f_{\cup\,l-1}(x) \leq 0, \end{cases} \quad l=1 \ldots p. \end{aligned}$4.
- For each level $\begin{aligned}
&\lexmin_{x,\,v_1,\ldots, v_p} \bigg(
\tfrac{1}{2}\|v_{\mathbb{E}_1}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_1}\|^2,\,\ldots,\,
\tfrac{1}{2}\|v_{\mathbb{E}_p}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_p}\|^2
\bigg) \
&\text{subject to: } \begin{cases}
f_{\mathbb{E}_l}(x) = v_{\mathbb{E}_l},\
f_{\mathbb{I}_l}(x) \leq v_{\mathbb{I}_l},\
f_{\cup\,l-1}(x) = v_{\cup\,l-1}^*,\
f_{\cup\,l-1}(x) \leq 0,
\end{cases} \quad l=1 \ldots p.
\end{aligned}$5:
- Solve HLSPs (with trust region) using subsolvers, applying HSF.
- Accept/reject steps, adapt $\begin{aligned} &\lexmin_{x,\,v_1,\ldots, v_p} \bigg( \tfrac{1}{2}\|v_{\mathbb{E}_1}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_1}\|^2,\,\ldots,\, \tfrac{1}{2}\|v_{\mathbb{E}_p}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_p}\|^2 \bigg) \ &\text{subject to: } \begin{cases} f_{\mathbb{E}_l}(x) = v_{\mathbb{E}_l},\ f_{\mathbb{I}_l}(x) \leq v_{\mathbb{I}_l},\ f_{\cup\,l-1}(x) = v_{\cup\,l-1}^*,\ f_{\cup\,l-1}(x) \leq 0, \end{cases} \quad l=1 \ldots p. \end{aligned}$6, update filter.
- Upon convergence at level $\begin{aligned} &\lexmin_{x,\,v_1,\ldots, v_p} \bigg( \tfrac{1}{2}\|v_{\mathbb{E}_1}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_1}\|^2,\,\ldots,\, \tfrac{1}{2}\|v_{\mathbb{E}_p}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_p}\|^2 \bigg) \ &\text{subject to: } \begin{cases} f_{\mathbb{E}_l}(x) = v_{\mathbb{E}_l},\ f_{\mathbb{I}_l}(x) \leq v_{\mathbb{I}_l},\ f_{\cup\,l-1}(x) = v_{\cup\,l-1}^*,\ f_{\cup\,l-1}(x) \leq 0, \end{cases} \quad l=1 \ldots p. \end{aligned}$7, propagate feasible/optimal slacks to level $\begin{aligned} &\lexmin_{x,\,v_1,\ldots, v_p} \bigg( \tfrac{1}{2}\|v_{\mathbb{E}_1}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_1}\|^2,\,\ldots,\, \tfrac{1}{2}\|v_{\mathbb{E}_p}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_p}\|^2 \bigg) \ &\text{subject to: } \begin{cases} f_{\mathbb{E}_l}(x) = v_{\mathbb{E}_l},\ f_{\mathbb{I}_l}(x) \leq v_{\mathbb{I}_l},\ f_{\cup\,l-1}(x) = v_{\cup\,l-1}^*,\ f_{\cup\,l-1}(x) \leq 0, \end{cases} \quad l=1 \ldots p. \end{aligned}$8.
3. Hierarchical Newton/Gauss–Newton and Threshold Adaptation
The core of S-HLSP is the choice of local model for each HLSP subproblem:
- Hierarchical Newton: Utilize full second-order (Lagrangian) information, regularized for stability.
- Hierarchical Gauss–Newton: Drop second-order terms, using only $\begin{aligned} &\lexmin_{x,\,v_1,\ldots, v_p} \bigg( \tfrac{1}{2}\|v_{\mathbb{E}_1}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_1}\|^2,\,\ldots,\, \tfrac{1}{2}\|v_{\mathbb{E}_p}\|^2,\; \tfrac{1}{2}\|v_{\mathbb{I}_p}\|^2 \bigg) \ &\text{subject to: } \begin{cases} f_{\mathbb{E}_l}(x) = v_{\mathbb{E}_l},\ f_{\mathbb{I}_l}(x) \leq v_{\mathbb{I}_l},\ f_{\cup\,l-1}(x) = v_{\cup\,l-1}^*,\ f_{\cup\,l-1}(x) \leq 0, \end{cases} \quad l=1 \ldots p. \end{aligned}$9.
An adaptive per-level threshold 0 governs the choice: if the current slack norm 1 exceeds the threshold, Newton regularization is activated; otherwise, the Gauss–Newton approximation is used. This approach ensures both numerical stability when infeasibilities remain and avoids unnecessary variable locking at lower levels when constraints are already nearly satisfied. The threshold is automatically adapted based on filter progress and rejection counts (Pfeiffer et al., 2024).
4. HLSP Solvers: Nullspace-Projected ADMM and Interior-Point Methods
Two major classes of algorithms efficiently solve the HLSP and subproblems:
- 2ADM3 (Nullspace ADMM): Projects the HLSP onto the nullspace of active high-priority constraints using a computationally efficient turnback algorithm. The projected system is then solved via an alternating direction method of multipliers (ADMM) in 4 coordinates, exploiting closed-form slack projections and dual updates.
- s-5IPM-HLSP (Sparse Reduced-Hessian IPM): Employs a reduced Hessian interior-point method in nullspace coordinates for each subproblem, using the turnback algorithm to exploit the banded structure of Jacobians in large-scale optimal control.
Nullspace basis computation is critical and relies on either efficient block-banded LU factorization or the turnback algorithm, avoiding expensive rank-revealing factorizations. For systems with discretized dynamics (Euler or multiple-shooting), upper bounds on nullspace basis bandwidth ensure scalability (Pfeiffer et al., 2024, Pfeiffer et al., 2023).
Solver Summary
| Solver | HLSP Method | Nullspace Basis |
|---|---|---|
| 6ADM7 | ADMM + nullspace proj. | Turnback algorithm |
| s-8IPM | Reduced-Hessian IPM | LU/Turnback |
9ADM0 is especially effective for sparse, structured banded systems, and provides moderate-accuracy solutions rapidly. These can then warm-start higher-accuracy interior-point solvers.
5. Convergence Properties and Global Optimality
The S-HLSP methodology satisfies global convergence to a KKT point of the NL-HLSP under two key conditions:
- Filter radius condition: If, at each priority, the trust region is not expanded past the maximum filter-accepted radius, convergence is guaranteed.
- Filter initialization: If the filter at each level is initialized with strictly feasible front points, S-HLSP converges regardless of 1 size.
No restoration or feasibility repair is required because the slack-based formulation ensures feasibility of every HLSP subproblem, even in the presence of infeasible higher-level objectives (Pfeiffer et al., 2023).
The adaptation of nullspace trust regions at each priority level ensures that while moving through the hierarchy, progress at each level does not violate the KKT set of higher priorities.
6. Computational Performance and Scalability
Extensive benchmarks on non-linear test functions, whole-body inverse kinematics, and multi-contact optimal control demonstrate the numerical efficacy and scalability of S-HLSP methods:
- For moderate to large-scale tasks (e.g., robot inverse kinematics 2; whole-body optimal control 3), 4ADM5 and s-6IPM provide competitive or superior solve times compared to state-of-the-art commercial (H-MOSEK, Gurobi) and open-source (H-OSQP, H-PIQP) solvers, while achieving lower KKT residuals in hierarchical settings.
- Nullspace-projected ADMM yields fast moderate-accuracy solutions (7 to 8 residuals in ms for 9), while interior-point methods can reach high-accuracy KKT points (0) in larger but still tractable runtimes.
- The turnback algorithm preserves sparsity and banded structure, enabling scalability to high-dimensional, multiple-shooting discretizations with low computational overhead per stage.
7. Warm-Starting and Combined Solver Strategies
An effective computational strategy is to use a first-order HLSP solver like 1ADM2 to obtain a moderate-accuracy warm start, followed by a high-accuracy interior-point method (e.g., H-MOSEK) to rapidly reach tight KKT residuals. This two-stage approach reduces both the total number of outer sequential iterations and the number of expensive inner iterations in the high-precision solver. The benefit is especially pronounced in large-scale under-actuated systems and structured optimal control problems (Pfeiffer et al., 2024).
8. Applications and Extensions
S-HLSP is particularly suited for:
- Prioritized robot control, generating kinematically and dynamically feasible trajectories respecting task hierarchies (position, force, redundancy resolution).
- Discrete-time non-linear optimal control with complex multi-level constraints (multi-contact locomotion, time-optimal control).
- Sparse and structured system identification, where hierarchical recursive least squares formulations also emerge (Slavakis, 2018).
The methodology is extensible to stochastic and adaptive settings, where variants such as S-FM-HSDM and hierarchical RLS address noisy, online, or composite convex versions of S-HLSP.
References
- "Efficient Lexicographic Optimization for Prioritized Robot Control and Planning" (Pfeiffer et al., 2024)
- "Sequential Hierarchical Least-Squares Programming for Prioritized Non-Linear Optimal Control" (Pfeiffer et al., 2023)
- "The Stochastic Fejér-Monotone Hybrid Steepest Descent Method and the Hierarchical RLS" (Slavakis, 2018)