Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 66 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 21 tok/s Pro

GPT-5 High 30 tok/s Pro

GPT-4o 91 tok/s Pro

Kimi K2 202 tok/s Pro

GPT OSS 120B 468 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Low-Rank Tensor Decompositions

Updated 22 August 2025

Low-rank tensor decompositions are techniques that approximate high-dimensional data arrays by exploiting multilinear structures to significantly reduce parameters.
Methods like Alternating Maximization and Newton's approach offer practical trade-offs between convergence speed and computational cost in achieving quality approximations.
CUR-based and sampling strategies provide scalable solutions for massive tensors, balancing computational efficiency with approximation accuracy.

Low-rank tensor decompositions are foundational tools for representing high-dimensional data arrays in condensed formats by exploiting linear or multilinear structure. They are central in data compression, dimension reduction, signal and image processing, genomics, and many areas of scientific computing and machine learning. Unlike the matrix (order-2 tensor) case, where the singular value decomposition provides an optimal low-rank approximation, higher-order (order- $d>2$ ) tensors require more nuanced methods with distinct computational, theoretical, and practical challenges.

1. Fundamentals of Low-Rank Tensor Approximation

The prototypical goal is to approximate a given $d$ -mode tensor $T \in \mathbb{R}^{n_1 \times n_2 \times \dots \times n_d}$ by a tensor with far fewer parameters. In the matrix case ( $d=2$ ), the best rank- $k$ approximation is efficiently solvable via the singular value decomposition (SVD): $A = \sum_{i=1}^{k} \sigma_i u_i v_i^\top,$ where the $\sigma_i$ are singular values, and $u_i$ , $v_i$ are the corresponding singular vectors.

However, for $d>2$ there is no canonical higher-order SVD. Instead, a common approach is the best \emph{multilinear rank} $(r_1, ..., r_d)$ approximation: find subspaces $U_i \subset \mathbb{R}^{n_i}$ , $\dim U_i = r_i$ , and maximize the $\ell_2$ norm of the projection of $T$ onto the tensor product subspace $\otimes_{i=1}^d U_i$ . The problem can be formulated as

$\min_{U_i} \| T - P_{U_1 \otimes ... \otimes U_d}(T)\|^2 = \|T\|^2 - \max_{U_i} \| P_{U_1 \otimes ... \otimes U_d}(T)\|^2,$

where the best approximation is the projection operator norm maximization (formula (1) in (Friedland et al., 2014)).

2. Key Algorithms: Alternating Maximization and the Newton Method

Alternating Maximization Method (AMM)

The dominant practical method is Alternating Maximization (AMM), in which all but one subspace $U_i$ are fixed and the optimal $U_i$ is updated, then the roles are rotated cyclically. Formally, the $U_i$ update step involves solving an eigenvalue problem: $A_i = \sum_{\mathbf{j}_\ell=1}^{r_\ell,\ \ell\ne i} [T \times (\otimes_{\ell \ne i} u_{j_\ell, \ell})] [T \times (\otimes_{\ell \ne i} u_{j_\ell, \ell})]^T,$ with the leading $r_i$ eigenvectors forming the updated $U_i$ (see formula (2) in (Friedland et al., 2014)).

AMM iterates the map $F: \Psi \to \Psi$ on the product of Grassmannians: $F(U_1, ..., U_d) = (F_1(U), ..., F_d(U)),$ where $F_i$ is the maximizer given all other subspaces. This process converges (usually to a stationary point, possibly only locally optimal due to nonconvexity) (see (Friedland et al., 2014)).

Newton Method for Fixed Points

The Newton method is proposed to accelerate local convergence once near a fixed point of $F$ . In a local Euclidean parameterization (coordinate chart), the iteration takes the form: $x^{(\ell)} = x^{(\ell-1)} - [I - DF(x^{(\ell-1)})]^{-1} (x^{(\ell-1)} - F(x^{(\ell-1)})),$ (formula (5) in (Friedland et al., 2014)) where $DF$ is the derivative of $F$ . For rank-1 cases or highly structured subsets (such as on products of spheres), explicit fixed point equations and Jacobians can be written.

The Newton method requires computation of the Jacobian and inversion of a linear system whose dimensionality is $L = \sum_{i=1}^d (n_i - r_i)r_i$ , resulting in higher per-iteration cost compared to AMM. However, the number of iterations is often significantly reduced.

3. CUR Approximations and Sampling-Based Strategies

For large matrices and higher-order tensors, direct SVD or AMM may be computationally prohibitive. The CUR decomposition, originally from the matrix literature, approximates a matrix by selecting a small number of rows and columns and reconstructs using the intersection submatrix. This concept generalizes to tensors by matricizing (unfolding) the tensor in one or more modes, selecting important slices or fibers, and computing the pseudo-inverse on these small submatrices (see formula (11) in (Friedland et al., 2014)).

Variants for tensors involve selecting index sets for each mode, forming the subtensor (core), and reconstructing via projections. CUR-approximations become particularly valuable when only partial access to the data is possible or for extremely large datasets.

4. Numerical Comparison and Performance

Empirical results demonstrate:

AMM and its variants (such as MAMM and 2AMM/2AMMV) have relatively low computational cost per iteration but may converge slowly, with risk of stalling at local minima.
Newton methods require more resources per step due to Jacobian evaluation and inversion but often converge in fewer iterations, especially in the vicinity of stationary points.
For small target multilinear ranks (e.g., rank-1 or $(2,2,2)$ ), Newton-type methods are highly competitive.
For large tensors and moderate ranks, alternating methods—specifically, the 2AMMV modification—can outperform Newton methods, particularly when exploiting parallelism.
Compared to Grassmann-manifold-based Newton methods (see work by Savas–Lim and [ES09]), the coordinate Newton approach is easier to implement for arbitrary tensor order and remains highly parallelizable, albeit sometimes marginally slower per Newton step.

5. Implementation Aspects, Complexity, and Limitations

The main computational bottlenecks arise from:

The eigenvalue problem in updating subspaces ( $O(\prod_i r_i)$ per update).
Jacobian assembly and inversion for the Newton method (with cost scaling with $L^3$ ).
Tensor contractions required to compute projections and matrix multiplications in each iteration.

Alternating methods are suitable for large-scale problems where per-iteration cost must be minimized. Newton-based methods should be reserved for accelerating convergence when close to a fixed point or when high accuracy is required in moderate-sized problems.

CUR decomposition and sampling-based methods are advantageous when working with tensors that cannot be stored or accessed in their entirety, since they require only a subset of tensor entries and computations. Their effectiveness depends on the quality of the selected samples, with error controlled via the choice of submatrices/fibers.

6. Summary Table: Method and Properties

Method	Convergence Speed	Per-Iteration Cost	Parallelizability	Suitability
AMM/MAMM	Slow-moderate	Low	High	Very large tensors, quick first pass
Newton-1/2	Fast (locally)	Moderate-high	High	Moderate/small tensors, high accuracy regions
CUR-based	Very fast	Low	High	Massive tensors, limited access/data compression

7. Conclusions and Research Directions

Low-rank tensor decompositions generalize ideas from matrix theory, but introduce unique algorithmic and theoretical questions. This synthesis, as in (Friedland et al., 2014), illustrates several key points:

The best multilinear rank approximation is the central principle for higher-order tensor compression.
AMM is a simple, flexible, and general-purpose tool, suitable for large-scale problems where moderate local optimality suffices.
Newton-based fixed-point methods offer significantly improved local convergence for higher accuracy requirements, but with nontrivial computational overhead.
CUR-type and sampling-based decompositions are vital for large-scale applications, offering practical tradeoffs, albeit with slightly worse approximation guarantees compared to SVD-based methods.
Further advances involve robust stopping criteria, improved initialization, and hybrids combining alternating and Newton steps, as well as methods tailored for sparse, incomplete, or streaming tensor data.

The landscape described is shaped by intrinsic geometric properties (product Grassmannians), practical computational tradeoffs, and the increasing demand for efficient decompositions in scientific computing, data analytics, and beyond (Friedland et al., 2014).

PDF Markdown Chat (Pro)

References (1)

Low-rank approximation of tensors (2014)

Follow Topic

Get notified by email when new papers are published related to Low-Rank Tensor Decompositions.

Low-Rank Tensor Decompositions

1. Fundamentals of Low-Rank Tensor Approximation

2. Key Algorithms: Alternating Maximization and the Newton Method

Alternating Maximization Method (AMM)

Newton Method for Fixed Points

3. CUR Approximations and Sampling-Based Strategies

4. Numerical Comparison and Performance

5. Implementation Aspects, Complexity, and Limitations

6. Summary Table: Method and Properties

7. Conclusions and Research Directions

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Low-Rank Tensor Decompositions

1. Fundamentals of Low-Rank Tensor Approximation

2. Key Algorithms: Alternating Maximization and the Newton Method

Alternating Maximization Method (AMM)

Newton Method for Fixed Points

3. CUR Approximations and Sampling-Based Strategies

4. Numerical Comparison and Performance

5. Implementation Aspects, Complexity, and Limitations

6. Summary Table: Method and Properties

7. Conclusions and Research Directions

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research