Truncated Tucker Decomposition

Updated 21 November 2025

Truncated Tucker Decomposition is a method for low-multilinear-rank approximation of tensors, factoring them into a core tensor and orthonormal factor matrices.
It employs techniques like classic and sequentially truncated HOSVD as well as randomized algorithms to minimize the Frobenius-norm reconstruction error.
Advanced implementations include automatic rank selection and communication-optimized parallel algorithms for applications in scientific data compression and machine learning.

Truncated Tucker decomposition refers to computing a low-multilinear-rank approximation of an $N$ -way tensor $\mathcal X \in \mathbb R^{I_1 \times \cdots \times I_N}$ , targeting prescribed ranks $(R_1, \ldots, R_N)$ , and writing $\mathcal X \approx \mathcal G \times_1 U^{(1)} \times_2 U^{(2)} \cdots \times_N U^{(N)},$ where $\mathcal G \in \mathbb R^{R_1 \times \cdots \times R_N}$ is the core tensor and $U^{(n)} \in \mathbb R^{I_n \times R_n}$ are orthonormal factor matrices for each mode. The problem amounts to minimizing the Frobenius-norm error under subspace constraints on the factor matrices. Truncated Tucker decomposition underpins tensor compression, data analysis, scientific simulation, and is foundational for scalable algorithms in computational multilinear algebra.

1. Mathematical Formulation and Objective

Let $\mathcal X \in \mathbb R^{I_1 \times \cdots \times I_N}$ , and choose multilinear ranks $(R_1, \ldots, R_N)$ with $R_n \ll I_n$ . The truncated Tucker model seeks

$\min_{\{U^{(n)}\},\,\mathcal G} \left\| \mathcal X - \mathcal G \times_1 U^{(1)} \times_2 U^{(2)} \cdots \times_N U^{(N)} \right\|_F^2 ~~\text{with}~~ U^{(n)T}U^{(n)}{=}I_{R_n} \quad (n=1\ldots N).$

Here, $\times_n$ denotes the mode- $n$ tensor-times-matrix (TTM) product. The optimal core is

$\mathcal G = \mathcal X \times_1 U^{(1)T} \times_2 U^{(2)T} \cdots \times_N U^{(N)T}.$

The approximation error is

$\|\mathcal X - \widehat{\mathcal X}\|_F^2 = \|\mathcal X\|_F^2 - \|\mathcal G\|_F^2,$

enforcing that each factor matrix projects along the best $R_n$ -dimensional subspace revealed by the mode- $n$ unfolding. The method generalizes matrix SVD to higher-order tensors.

2. Classical and Flexible st-HOSVD Algorithms

Truncated Tucker decompositions are typically computed via higher-order singular value decomposition (HOSVD) and variants:

Classic t-HOSVD: For each mode $n$ , unfold $\mathcal X$ into $X_{(n)}$ , compute the leading $R_n$ left singular vectors to form $U^{(n)}$ , then assemble the core using all $U^{(n)}$ together.
Sequentially Truncated HOSVD (st-HOSVD): Sequentially project and truncate the running core tensor along each mode, shrinking dimensions successively and yielding lower storage and computational cost.
Mode-wise flexible st-HOSVD: a-Tucker (Li et al., 2020) generalizes st-HOSVD by allowing either eigen-decomposition or alternating least squares (ALS) to compute each $U^{(n)}$ adaptively per mode.

Pseudocode (as in (Li et al., 2020)):

Input: Tensor X ∈ ℝ^{I₁×…×I_N}, ranks (R₁,…,R_N)
Y ← X
for n = 1 to N do
   method ← algorithmSelector(n, Iₙ, Rₙ, ...)
   if method == EIG then
      S ← Gram(Y, mode=n)  # S = Y_{(n)} · Y_{(n)}ᵀ
      U⁽ⁿ⁾ ← leading Rₙ eigenvectors of S
      Y ← Y ×ₙ U⁽ⁿ⁾ᵀ
   else   # ALS solver
      (L, Rtensor) ← ALS_solve(Y, mode=n, Rₙ)
      (Q, _) ← QR(L)
      U⁽ⁿ⁾ ← Q
      Y ← reshape(Rtensor) ×ₙ R 
   end if
end for
G ← Y

ALS solves the matrix low-rank approximation via row-wise or column-wise least-squares updates. Empirically, few iterations suffice due to rapid q-linear convergence (Xiao et al., 2020).

3. Avoiding Explicit Matricization and Accelerated Implementations

Most implementations construct explicit matrix unfoldings and leverage high-performance GEMM routines, but this introduces data conversion and memory overhead. a-Tucker achieves matricization-free tensor contractions by recasting TTMs and Gram matrix products as nested loops mapped to high-performance BLAS kernels, reducing both memory footprint and unnecessary transpositions (Li et al., 2020). This approach systematically eliminates the need for storing large intermediate unfoldings, yielding 4%–386% reduction in execution times and 4%–45% peak memory savings compared to explicit unfolding (Li et al., 2020).

High-performance computation is further enabled via intrinsic parallelism of ALS-based methods (Xiao et al., 2020) and communication-optimized schemes such as TuckerMPI (Ballard et al., 2019):

TuckerMPI: deploys a parallel, block-distributed algorithm for st-HOSVD/HOOI. Optimized tensor-times-matrix and Gram kernel implementations yield scalable compression of terabyte-scale tensors.
Randomized algorithms: Randomized range estimators using sketching, power iterations, and Kronecker-structured random matrices (Minster et al., 2022, Che et al., 5 Jun 2025, Che et al., 2023, Hashemi et al., 2023) further accelerate mode-wise truncation and provide probabilistic error guarantees.

4. Randomized and Adaptive Algorithms

Randomized truncation methods replace expensive SVDs with sketch-based range finding, yielding substantial speedups with only a marginal increase in approximation error:

Randomized (st-)HOSVD: Each mode-vectors subspace can be computed via randomized power iteration, adaptive shifted iterations, and/or approximate matrix multiplication, reducing computation from $O(I_n J_n R_n)$ to $O(T_n I_n R_n)$ for sample size $T_n\ll J_n$ (Che et al., 2023, Che et al., 5 Jun 2025).
Probabilistic error bounds: For randomized methods, the expected reconstruction error is controlled by the optimal rank-truncated residual plus scaling factors reflecting sketching parameters, as in (Hashemi et al., 2023, Che et al., 5 Jun 2025, Minster et al., 2022).
Single-mode sketching: RTSMS (Hashemi et al., 2023) performs randomized subspace estimation for only one mode at a time using small Gaussian/structured sketches, with mode-wise sketch sizes of $O(R_n + K)$ , and efficiently solves the large-scale least-squares via randomized subsampling and refinement.

Parameter selection (oversampling $K\sim 5$ –10, power iterations $q=1$ –2, adaptive stopping using per-vector-error criteria) strikes a practical tradeoff between accuracy and speed (Che et al., 2023, Che et al., 5 Jun 2025).

5. Automatic Rank Selection and Adaptive HOOI

Classical Tucker computation assumes fixed multilinear rank. However, adaptive algorithms infer the minimal rank necessary to meet a prescribed error tolerance:

Rank-adaptive HOOI: At each mode update, the minimal truncation rank $r_n$ is selected as the smallest integer such that the tail sum of singular values drops below the residual tolerance, guaranteeing $\|\mathcal T - \widehat{\mathcal T}\|_F \leq \varepsilon \|\mathcal T\|_F$ (Xiao et al., 2021).
Group sparsity techniques: For incomplete data, log-sum penalties over core tensor fibers induce structured sparsity, driving many fibers to zero and yielding an automatically truncated core and minimal ranks (Yang et al., 2015).
Monotonic convergence: Rank-adaptive HOOI ensures monotonic decrease of ranks and the error, converging in finitely many steps with provable local optimality.

6. Computational and Statistical Guarantees

Complexity: ALS-based methods avoid explicit SVD and data explosion, achieving per-mode cost $O(R_n I_1\cdots I_N \cdot \text{iters}_n)$ (Xiao et al., 2020). Randomized approaches have $O(\sum_n I_n J_n \ell_n)$ cost with small sketch size $\ell_n$ (Minster et al., 2022, Che et al., 2023).
Approximation error: Deterministic algorithms provide Frobenius-norm error no larger than the sum of post-truncation singular value tails per mode; randomized algorithms yield similar guarantees up to small multiplicative factors (Che et al., 5 Jun 2025).
Optimization landscape: Under exact multilinear rank, the nonconvex Tucker objective exhibits no spurious local minima; local search algorithms (SGD, perturbed Newton, ALS) provably find global optima (Frandsen et al., 2020).

7. Practical Performance and Applications

Empirical studies consistently show 1–2 orders of magnitude acceleration in truncated Tucker decomposition using ALS and randomized sketching over classic SVD-based HOSVD, while matching or closely tracking the optimal reconstruction error (Li et al., 2020, Xiao et al., 2020, Minster et al., 2022, Hashemi et al., 2023):

a-Tucker achieves 22.9× CPU and 2.8× GPU speedup over eigen-based st-HOSVD, and is faster in 93%–94% of instances (Li et al., 2020).
Large-scale parallel codes (TuckerMPI) compress multi-terabyte scientific data by up to $10^5\times$ , supporting reconstruction and analysis at the workstation scale (Ballard et al., 2019).
Randomized/parallel methods attain up to $16\times$ speedup on distributed architectures, with negligible accuracy loss (Minster et al., 2022).
Automatic rank selection methods precisely recover the true rank and error threshold, reducing model complexity without user intervention (Xiao et al., 2021, Yang et al., 2015).

These advances underpin applications in scientific data compression, hyperspectral imaging, video and volumetric data, and high-dimensional machine learning.