Hierarchical Alternating Least Squares

Updated 3 September 2025

Hierarchical ALS is a technique that decomposes structured low-rank problems into blockwise least-squares subproblems, enhancing computational efficiency.
The method leverages closed-form solutions and projections to enforce constraints, ensuring fast convergence and improved reconstruction accuracy.
Applied in NMF, tensor completion, and robotics, HALS offers scalable updates and robustness in high-dimensional signal processing and optimization.

The Hierarchical Alternating Least Squares (HALS) algorithm is a class of alternating optimization techniques designed for hierarchical or blockwise structured low-rank approximations and factorizations in matrix and tensor problems. HALS methods generalize the standard alternating least squares framework by decomposing the overall minimization task into a sequence of lower-dimensional, often closed-form, subproblems that are solved in a coordinated, possibly multi-level, sequence. These methods are applicable across a wide range of contexts including matrix and tensor completion, compressed sensing, nonnegative matrix factorization, structured low-rank approximation, signal processing, and robotics.

1. Algorithmic Structure of Hierarchical ALS

HALS algorithms alternate the minimization of the objective function over subsets (blocks) of variables, handling each block in a strictly convex or least-squares subproblem. Formally, for a general low-rank modeling problem such as reconstructing a matrix $X \in \mathbb{C}^{n \times p}$ or an order- $d$ tensor $\mathcal{A}$ from (possibly undersampled or noisy) measurements, the structural form is:

$\min_{\{B^{(i)}\}} J(\{B^{(i)}\}) \quad \text{s.t. structural/low-rank constraints}$

where the blocks $B^{(i)}$ can represent columns of a factor matrix, tensor cores in TT or Tucker decompositions, or other partitioned entities in hierarchical models. The key distinction in HALS compared to flat ALS is that updates are organized hierarchically or blockwise, and often exploit problem-specific structures.

Procedurally, each iteration consists of:

Fixing all but one block $B^{(i)}$ (or a small group of blocks) and solving a local least-squares or nonnegative least-squares (NLS) problem for that block;
Applying a projection if additional structural constraints are present (e.g., Hankel, Toeplitz, positive semidefinite, or physical invariants such as in quaternion representations);
Cycling through all blocks in a prescribed order or using a potentially multi-level hierarchy.

This block-coordinate strategy underpins the “rank-1 residue iteration” variant for NMF and enables efficient parallelization and improved stability, as each blockwise subproblem is strictly convex and solvable in closed form (Chu et al., 2020, Pan, 22 Jul 2024).

2. Mathematical Foundations and Key Update Formulas

In the matrix setting, suppose the rank- $r$ factorization $X = L R$ is sought from measurements $y = \mathcal{A}(X) + n$ , the classical ALS alternates:

$\begin{aligned} \text{Fix } L &: & \operatorname{vec}(\hat{R}) &= \left[A (I_p \otimes L)\right]^{\dagger} y \ \text{Fix } R &: & \operatorname{vec}(\hat{L}) &= \left[A (R^T \otimes I_n)\right]^{\dagger} y \end{aligned}$

where $A$ is the sensing operator, $I_p$ is the identity, and $\dagger$ denotes the Moore-Penrose pseudoinverse (Zachariah et al., 2012). In a hierarchical extension, these minimizations are recursively decomposed: e.g., updating one column of $L$ or $R$ at a time (“rank-1 updates,” as in the HALS for NMF (Chu et al., 2020)).

For tensor train (TT) or hierarchical Tucker formats, the core update is:

$\min_{G_s} \| (M - A^G)|_P \|_F^2$

for each TT-core $G_s$ , where $A^G$ is the current TT-approximation, $M$ is the sampling tensor, $|_P$ restricts to observed entries, and the explicit update involves orthogonalized unfoldings (Grasedyck et al., 2015):

$G_s(j) = (G^{<s})^T Z_{(s)}(j) (G^{>s})^T, \quad \forall j$

For nonnegative matrix/tensor factorization, blockwise NLS subproblems of arbitrary rank- $k$ are solved via a recursive formula (Chu et al., 2020). When $k=1$ , these reduce exactly to HALS, e.g.,

$u_\ell = \operatorname{argmin}_{u \geq 0} \|A - UV^T\|_F^2,$

where only the $\ell$ -th column is updated, yielding the rank-1 residue iteration structure.

If additional structure is imposed (e.g., positive semidefinite), after the unconstrained minimization, the solution is projected:

For linearly structured matrices: $X = \operatorname{mat}_{n,p}(S S^\dagger \operatorname{vec}(LR))$
For PSD matrices: $X = V_r \widetilde{\Lambda}_r V_r^*$ , where only the top $r$ eigencomponents are retained (Zachariah et al., 2012).

3. Theoretical Convergence Analysis

HALS convergence is shaped by both the structure of the micro-update steps and the global coupling:

In the rank-1 tensor or matrix setting, detailed local convergence bounds—polynomial of order $N-1$ for $N$ -way orthogonally decomposable CP tensors, linear for incoherently decomposable cases—are established using angular error metrics (Hu et al., 20 May 2025, Espig et al., 2015, Espig et al., 2015). For rank-1 approximation, Q-linear and Q-superlinear rates arise depending on “dominance conditions” in the multilinear interaction.
In hierarchical tensor decompositions, the overall contraction depends on the product of local contraction factors at each node/block. Under reasonable assumptions (boundedness, rank stability, uniqueness of minimizer in each block, proper initialization), similar descent properties and convergence rates hold for hierarchical ALS as in the flat case (Espig et al., 2015).
For block-coordinate updates in NLS subproblems, strict convexity on the feasible set ensures global convergence of the iterates to a stationary point (Pan, 22 Jul 2024).

Crucially, the local convergence rate for orthogonally decomposable problems is governed by

$\epsilon_{k} \leq [c(\kappa, R)\,\epsilon_{k-1}]^{N-1},$

with $\epsilon_k$ the maximal subspace angle error and $c(\kappa, R)$ capturing the scaling with respect to the weight condition number and rank (Hu et al., 20 May 2025).

4. Practical Implementation and Applications

HALS-type methods have been empirically validated and applied in various contexts:

Nonnegative Matrix and Tensor Factorization: Hierarchical updates in ARkNLS with $k=1$ (HALS) provide efficient NMF; closed-form updates for $k=2,3$ offer improved convergence and flexibility (Chu et al., 2020). For quaternion NMF (QNMF), hierarchical updates respect additional constraints relevant to color and polarization image processing; the update steps include physically meaningful projection operators ensuring nonnegativity or positive semidefiniteness of the source factors (Pan, 22 Jul 2024).
Low-Rank Matrix/Tensor Reconstruction: When a-priori structure (Hankel, Toeplitz, PSD) is known, HALS/ALS with projection outperforms unconstrained solvers, achieving error rates close to the corresponding Cramér–Rao bounds (Zachariah et al., 2012).
High-Dimensional Tensor Completion: Hierarchical methods in TT/HT formats allow the decoupling of large-scale tensor updates into small-scale slice-wise least-squares problems, leading to $\mathcal{O}(d r^2 n)$ degrees of freedom for a $d$ -way tensor of size $n^d$ and moderate rank $r$ (Grasedyck et al., 2015). Computational complexity per sweep is $\mathcal{O}(r^4 d \#P)$ , where $\#P$ is the number of observed entries.
Robotics and Hierarchical Programming: Hierarchical least-squares programming problems with multiple priority levels are solved via block-coordinate optimization in both primal (nullspace-based) (Pfeiffer et al., 2021) and dual (ADMM-based) (Pfeiffer, 27 May 2025) settings, providing both efficiency and differentiability for integration into learning pipelines.
Signal Processing and Communications: In applications such as MIMO channel estimation (SALSA), sequential (hierarchical) ALS exploits tensor decompositions (Tucker, Kronecker) to achieve robust estimation in the presence of measurement noise and non-invertible design matrices (Gherekhloo et al., 2023).

5. Hierarchical Extensions and Recent Developments

The hierarchical organization of ALS has been motivated and generalized in several strands:

Multilevel or Blockwise Updates: Instead of alternating over the entire factor, updates may proceed recursively in trees or along the tensor network, e.g., Tensor Train or Hierarchical Tucker decompositions (Grasedyck et al., 2015, Espig et al., 2015). The key advantage is computational tractability and scalability for high-dimensional data.
Moving Subspace Correction: HALS can be interpreted as successively minimizing over moving subspaces, a perspective shared with multiplicative Schwarz methods, which generalizes standard block Jacobi/Gauss–Seidel methods (Oseledets et al., 2017). In nested factorization formats, each level or block corresponds to a tangent or nested subspace, and the overall contraction is the product of the local (possibly curvature-corrected) projection operators.
Acceleration via Preconditioning: Recent developments incorporate preconditioning and acceleration, such as nonlinearly preconditioned L-BFGS approaches, which “wrap” a quasi-Newton optimizer around the ALS update, yielding superior time-to-solution and robustness for ill-conditioned or hierarchical problems (Sterck et al., 2018).
Hybrid Strategies: For convergence improvement, hybridization of block updates and orthogonalization/coherence reduction steps (e.g., SVD-based reorthogonalization in ALS iterations) is effective in both theory and numerical performance (Hu et al., 20 May 2025).

6. Performance Evaluation and Comparison with Other Methods

HALS-type methods, when compared against classical ALS, Gauss–Newton, or Newton-like optimization:

Efficiency: Blockwise or hierarchical updates reduce both per-iteration complexity and memory/storage requirements, especially when per-block problems are small or can be parallelized.
Accuracy: Structured projections (e.g., onto Hankel, PSD, or nonnegative sets) markedly improve reconstruction quality, achieving near-optimal signal-to-reconstruction error ratios, often within a small gap (e.g., 2.75 dB for Hankel or 0.77 dB for PSD structure at 10 dB SMNR) to the Cramér–Rao bound (Zachariah et al., 2012).
Convergence: The rate depends on problem structure. For orthogonally decomposable tensors of order $N$ , polynomial order $N-1$ contraction; for incoherently decomposable, linear. Non-HALS methods (e.g., full ALS or Newton-based methods) may not scale or may become stuck in “swamps” for high-dimensional or highly structured problems (Hu et al., 20 May 2025, Singh et al., 2019).
Scalability and Parallelism: In high-dimensional tensor and matrix scenarios, hierarchical ALS/TT/HT variants break the curse of dimensionality by reducing updates to small-scale slices or nodes, allowing for robust parallel implementations (Grasedyck et al., 2015, Xiao et al., 2020).

7. Applications, Limitations, and Future Prospects

HALS methods are widely used in signal processing, machine learning, computational imaging, data mining, and robotics for tasks requiring scalable, robust, and structured low-rank factorizations. Their hierarchical structure is especially important for problems with inherent block organization, tensor network structure, or sequential priority constraints.

Potential limitations include:

Sensitivity to local minima in strongly nonconvex problems;
Requirement for careful block/level initialization to guarantee local convergence;
Possible slowdowns or numerical instabilities when the rank or number of priority levels is very large (alleviated by recent ADMM-based and nullspace projection techniques) (Pfeiffer, 27 May 2025, Pfeiffer et al., 2021).

Ongoing research directions emphasize:

Integration of differentiable hierarchical least-squares solvers into end-to-end learning architectures (Pfeiffer, 27 May 2025);
Adaptive block scheduling and hybridization with preconditioning/acceleration methods (Sterck et al., 2018);
Extension of theoretical convergence guarantees to broader classes of structured and nonconvex problems, including tensor decompositions and constrained optimization (Hu et al., 20 May 2025).

In summary, the hierarchical alternating least squares paradigm provides an efficient, versatile, and theoretically well-founded approach for high-dimensional structured optimization, with proven success spanning numerical linear algebra, multivariate statistics, machine learning, and engineering applications.