Low-rank Tucker Decomposition

Updated 25 May 2026

Low-rank Tucker Decomposition is a tensor factorization method that models high-dimensional data as the contraction of a compact core tensor with low-dimensional factor matrices.
It offers efficient multilinear parameterization by generalizing low-rank matrix factorization to higher-order arrays, with robust recovery via iterative and stochastic algorithms.
Applications span machine learning, signal processing, and scientific computing, utilizing methods such as HOSVD, randomized sketching, and structured regularization.

A low-rank Tucker decomposition represents a high-dimensional tensor as the contraction of a modest-sized core tensor with a collection of low-dimensional factor matrices, each corresponding to a particular mode. This provides an efficient multilinear parameterization that generalizes low-rank matrix factorization to higher-order arrays. Classical, stochastic, and randomized algorithms for low-rank Tucker recovery, as well as application-structured variants and regularized objectives, underlie much of modern tensor analysis in machine learning, signal processing, and scientific computing.

1. Problem Formulation and Tucker Model

Given a $d$ -way tensor $X^* \in \mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$ , the low-rank Tucker model expresses $X^*$ as

$X^* = S \times_1 U^{(1)} \times_2 \cdots \times_d U^{(d)}$

where $S \in \mathbb{R}^{r_1 \times \cdots \times r_d}$ is the core tensor and $U^{(i)} \in \mathbb{R}^{n_i \times r_i}$ are the factor matrices. The vector $(r_1, ..., r_d)$ is the multilinear (Tucker) rank, where each $r_i = \mathrm{rank}(X^{\{i\}})$ with $X^{\{i\}}$ the mode- $i$ unfolding of $X^* \in \mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$ 0.

The low-Tucker-rank tensor recovery problem, a central focus in high-dimensional inverse problems and data analysis, is

$X^* \in \mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$ 1

where $X^* \in \mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$ 2 is observed via a linear operator $X^* \in \mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$ 3. The hard constraint on Tucker rank distinguishes this class from convex nuclear-norm surrogates or matrix unfoldings (Grotheer et al., 2019).

2. Deterministic and Stochastic Hard-Thresholding Algorithms

Iterative hard thresholding (IHT) algorithms for low-rank tensor recovery generalize established matrix techniques: each step alternates a gradient update of the least-squares loss with a projection onto the low-Tucker-rank manifold, operationally implemented by truncated higher-order SVD (HOSVD). The stochastic variant (StoTIHT) splits the measurements into batches and performs each gradient step using only a randomly sampled batch, yielding significant per-iteration speedup at the cost of gradient variance.

The StoTIHT procedure (Grotheer et al., 2019) is:

Randomly sample a batch of measurements;
Compute the partial gradient;
Take a stochastic gradient step (rescaled for unbiasedness);
Project onto rank- $X^* \in \mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$ 4 tensors via
- For each mode, mode- $X^* \in \mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$ 5 SVD, truncating to $X^* \in \mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$ 6;
- Form the core by projecting onto the corresponding left singular vector subspaces.

Linear convergence in expectation is attained under a tensor restricted isometry property (TRIP) and an approximate-optimality condition on the projection. StoTIHT achieves an error floor and contraction factor characterized explicitly by the noise level and TRIP constant.

Empirical results indicate that smaller batch sizes reduce wall-clock time, with rapid convergence even at drastic subsampling (e.g., $X^* \in \mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$ 7), and that the method robustly recovers both synthetic and real data tensors at large scale (Grotheer et al., 2019).

3. Randomized and Sketching-Based Approaches

Randomized algorithms for Tucker decomposition exploit data compression and dimensionality reduction by sketching tensor unfoldings with random projections or sampling schemes. Notable variants include:

Single-mode or two-sided sketches per unfolding, followed by low-rank approximation and iterative truncation (Hashemi et al., 2023, Dong et al., 2023, Minster et al., 2019);
CUR-type tensor approximations using selected fibers and subtensors for robust projections and alternating minimization (Cai et al., 2023).

Sketch-based ALS (alternating least squares) leverages the Kronecker structure of the core tensor update, recasting it as a (possibly regularized) ridge regression with a design matrix $X^* \in \mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$ 8 formed via factor matrices. Ridge leverage scores or classic leverage scores guide adaptive row sampling, yielding guarantees of $X^* \in \mathbb{R}^{n_1 \times n_2 \times \cdots \times n_d}$ 9-approximation to the exact fit with sample complexity sublinear in the full tensor size (Fahrbach et al., 2021, Ma et al., 2021). Randomized range-finding and power iterations enhance spectral decay handling.

Theoretical error bounds for randomized Slide/Sketch-STHOSVD and CUR techniques precisely track the singular value tail energies of each unfolding, with constants depending on oversampling and sketch dimensions. Randomized fiber sampling with range finding (mode-parallel HOSVD) enables strong-scaling in high mode dimensions, with flops and memory scaling as $X^*$ 0, compared to classical HOSVD’s $X^*$ 1 (Iannacito et al., 22 Mar 2026).

4. Structured and Regularized Low-Rank Tucker Recovery

Modern formulations integrate additional priors for real-world interpretability and robust estimation:

Weighted nuclear-norm penalties on factor matrices, sparse cores, and Laplacian-based smoothness regularization yield models for joint global low-rank structure and local smoothness (Gong et al., 4 Aug 2025).
Proximal gradient, PALM, and ADMM-based solvers are applied to nonconvex but structured objectives, with theoretical convergence (to critical points under KL or semi-algebraicity) shown for the PALM/ProADM algorithms (Gong et al., 4 Aug 2025, Pan et al., 2020).
Structured sparsity (group log-sum) on the core supports automatic rank determination and model selection (Yang et al., 2015).

In robust settings, Tucker decomposition is coupled with hard thresholding for sparse-corruption separation (TRPCA), as in robust tensor CUR alternating projections, or with robust loss functions (e.g., $X^*$ 2-E trimming) for outlier tolerance (Cai et al., 2023, Heng et al., 2022). $X^*$ 3 (exact sparsity) on the core tensor is attainable in tensor regression via noise augmentation, producing exact model pruning and improved predictor identification (Yan et al., 2023).

5. Symmetric, Structured, and Application-Driven Tucker Decompositions

Symmetric subtensors and structured domains motivate specialized Tucker variants:

Symmetric moment tensor decomposition leverages the Grassmann and Stiefel manifold geometry, with projected gradient descent and higher-order eigenvalue approximations (HOEVD) achieving both statistical efficiency and scalable implementation (Jin et al., 2022).
O-minus (ring-plus-bridge) architectures provide more balanced and compact cores for multi-view clustering, outperforming standard Tucker and tensor ring decompositions in multi-view latent information capture (Lu et al., 2022).
Nonnegative and sparse Tucker decomposition, accelerated via preceding low-multilinear-rank approximation, enables efficient first-order methods, improved uniqueness (with sparsity), and interpretability in nonnegative data settings (Zhou et al., 2014).

In large-scale structured problems, implicit and streaming versions circumvent the explicit construction of massive moment tensors or unfoldings—e.g., via moment outer-product identities or online QR-retraction (Jin et al., 2022, Iannacito et al., 22 Mar 2026).

6. Geometry, Optimization, and Rank-Adaptivity

The set of low-Tucker-rank tensors forms a real-algebraic variety with a geometry richer than that of matrix varieties. Explicit tangent cone parametrizations reveal a multilayered block structure, with gradient-projection methods (GRAP) and HOSVD-based retractions ensuring provable convergence to stationary points. Adaptive algorithms detect and modify multilinear ranks during iteration, supporting automatic model selection and robust tensor completion without prior knowledge of true rank (Gao et al., 2023, Yang et al., 2015).

Riemannian optimization on these varieties and associated manifolds underpins much of modern tensor optimization, supporting line search, retraction, and adaptation strategies crucial for practical performance and reliability.

7. Applications and Impact

Low-rank Tucker decomposition is foundational across high-dimensional data analysis, inverse problems, scientific computing, and machine learning. Empirical evidence demonstrates:

Superior accuracy and speed in tensor completion, denoising, and inpainting under high missingness and outlier regimes (Gong et al., 4 Aug 2025, Pan et al., 2020, Heng et al., 2022);
Robust traffic data imputation, image and video restoration, and multivariate regression with interpretable parameterizations and regularization (Gong et al., 4 Aug 2025, Pan et al., 2020, Yan et al., 2023);
Structured manifold estimation critical for moment tensor analysis, hyperspectral imaging, and multi-view clustering (Jin et al., 2022, Lu et al., 2022, Gao et al., 2023).

The continual development of scalable, stochastic, and regularized algorithms—anchored in the multilinear structure and geometry of the Tucker model—has expanded the range and reliability of tensor methods for large-scale, high-dimensional, and application-structured problems.