Tensor Network Decomposition

Updated 4 August 2025

Tensor network decomposition is a family of multilinear factorization techniques that approximates high-order tensors by contracting networks of lower-dimensional cores, such as those in the Tensor Train and Tensor Ring models.
It leverages structured models and canonical forms to mitigate the curse of dimensionality while balancing computational efficiency with expressive power.
Applied in data compression, neural network optimization, and quantum simulations, these decompositions enable robust and scalable multi-dimensional data analysis across diverse domains.

Tensor network decomposition refers to a broad family of multilinear factorization techniques in which a high-order tensor is approximated by a composition of smaller, interconnected tensors—typically arranged in a network structure that reflects computational or physical constraints. These constructions originated in quantum physics (notably as matrix product states and projected entangled pair states), but have become central in signal processing, machine learning, and large-scale optimization due to their ability to mitigate the curse of dimensionality and to efficiently represent, analyze, or compress structured data.

1. Mathematical Structure and Canonical Forms

Tensor network decomposition describes an array $\mathcal{X} \in \mathbb{R}^{n_1 \times \cdots \times n_d}$ as a contraction (over shared indices) of a network of low-order tensors. Let $G^{(k)}$ denote the core tensors, each with lower dimensionality than $\mathcal{X}$ . Examples of canonical forms include:

Tensor Train (TT)/Matrix Product State (MPS):

$\mathcal{X}(i_1,\ldots,i_d) = G_1(i_1) G_2(i_2) \cdots G_d(i_d)$

where $G_k(i_k)$ are $r_k \times r_{k+1}$ matrices and $r_1 = r_{d+1} = 1$ .

Tensor Ring (TR):

$\mathcal{X}(i_1,\ldots,i_d) = \mathrm{Tr}\{ Z_1(i_1) Z_2(i_2) \cdots Z_d(i_d) \}$

All TR-ranks are free and the trace "closes" the network.

Fully-Connected Tensor Network (FCTN)/Tensor Star:

Each pair of modes is coupled by shared indices, and the contraction is performed via multi-summation not just locally but globally, allowing direct correlations between any pair of modes.

The core idea is to approximate the original tensor as: $\mathcal{X} \approx \mathrm{TN}(G_1, G_2, \ldots ,G_d)$ where the specific contraction pattern and core tensor structures define the decomposition (TT, TR, FCTN, Tensor Star, etc.).

2. Principal Families and Their Properties

Tensor Train (TT):

The TT structure (or MPS in quantum physics) links third-order cores in a linear chain. Its primary benefit is linear storage complexity with respect to tensor order and full contraction cost scaling as $O(d n r^2)$ (with $n$ the mode size and $r$ the maximal TT-rank). However, TT is sensitive to permutation of tensor dimensions, as the sequential contractions cause strong dependency on variable ordering (Zhao et al., 2016).

Tensor Ring (TR):

TR decomposition generalizes TT by removing boundary rank constraints and connecting the first and last core via the trace. This circular symmetry grants permutation invariance under cyclic shifts of the tensor modes. All TR-ranks are free, and the model can be interpreted as a linear combination of TT decompositions with different cyclical orderings. This mechanism yields enhanced representational capacity and often more compact parameterizations compared to TT (Zhao et al., 2016).

Fully-Connected Tensor Network (FCTN):

In FCTN, each mode of the tensor is directly connected to every other, allowing the factorization to capture global mode-to-mode correlations. The order of contraction is invariant to input permutation, and the global structure often leads to better preservation of intrinsic tensor properties, albeit with significant parameter growth as a function of order (Zheng et al., 2021, Yang et al., 2022).

Hierarchical Tucker (HT):

HT decomposes higher-order tensors recursively using a (binary) dimension tree. Storage grows logarithmically in the order for balanced trees, and it provides better gradient behavior when compressing balanced weight matrices in neural networks (Wu et al., 2020).

Tensor Star (TS):

TS decomposition introduces a star-shaped network with $N$ order-3 factor tensors and $N$ order-4 core tensors connected in a ring. Each core tensor mediates interactions between all factor tensors, enabling direct mode-to-mode dependencies. TS decomposition controls both the curse of dimensionality and the curse of ranks by decoupling the local component ranks and maintaining linear storage complexity in the number of modes, with cost scaling as $O(N I R^2 + N R^4)$ for uniform mode size and ranks (Zhou et al., 15 Mar 2024).

Generalized Models:

Other forms, such as the Semi-Tensor Product (STP) based networks, further relax the dimension matching constraints on connections between cores or modes, yielding significantly higher compression ratios in neural networks with comparable or even improved performance (Zhao et al., 2021).

3. Optimization Algorithms and Practical Computation

Numerous algorithms exist for extracting core tensors in tensor network decompositions:

Sequential SVD (e.g., TR-SVD, TT-SVD):

Based on sequential unfolding and truncated SVD, providing a non-iterative, efficient route to lower-rank approximations, but sensitive to contraction ordering. Applied in TT and TR decompositions (Zhao et al., 2016).

Alternating Least Squares (ALS) and Block-Wise ALS:

Iteratively updates single or multiple cores, alternating by holding all others fixed and solving least-squares subproblems. Block updates (e.g., double- or triple-core) allow more flexible rank adaptation and higher accuracy, as in TT and TR (Phan et al., 2016, Zhao et al., 2016).

Adaptive Rank Algorithms:

Automatically adjust the local ranks during ALS optimization, increasing complexity only where needed (Zhao et al., 2016).

Proximal Alternating Minimization (PAM) and SVD-based Methods for Generalized FCTN/LMTN:

Used to solve tensor completion and higher-order FCTN models with latent matrices, guaranteeing convergence and significantly accelerating convergence compared to standard ALS (Yang et al., 2022).

For generalized loss-function models (e.g., Generalized CP, GCP), first-order optimization methods (such as L-BFGS-B) are enabled by closed-form gradient computation via the MTTKRP kernel, which remains compatible with most classical CP implementations (Hong et al., 2018).

4. Applications Across Domains

Tensor network decompositions are employed for:

Large-Scale Data Compression:

Representation of multiway data (signals, images, hyperspectral cubes, videos) with minimal parameters, preserving essential multi-mode correlations (Zhao et al., 2016, Zheng et al., 2021, Zhou et al., 15 Mar 2024).

Neural Network Compression:

Substitution of dense weight tensors in CNNs, RNNs, and Transformers with decomposed forms (TT, TR, HT, FCTN, etc.), enabling orders-of-magnitude compression and acceleration with minimal loss—and in some cases, modest accuracy gains (Liu et al., 2023, Wu et al., 2020, Pan et al., 2021, Zhao et al., 2021).

Feature Extraction and Classification:

Core tensors derived from decompositions used as robust and discriminative features for classification (image/video recognition, action detection) (Zhao et al., 2016).

Matrix and Tensor Completion:

Filling in missing values in high-dimensional arrays, leveraging low-rank tensor structure for robust recovery, as in image/video inpainting, hyperpectral super-resolution, and time-series anomaly detection (Yang et al., 2022, Zheng et al., 2021, Streit et al., 2020).

Quantum Simulations and Path Integrals:

Efficient simulation of open quantum systems and spin chains by mapping path integrals and reduced density matrices to 2D tensor networks with spatial-temporal decompositions (MPS/MPO) (Bose et al., 2021).

Privacy-Preserving Distributed Computation:

Representation and dispersal of large data as randomized or encrypted TN cores across non-colluding servers, leveraging the non-uniqueness and non-interpretable nature of decomposed blocks (Ong et al., 2021, Ong et al., 2018).

Adaptive Loss Modeling in Data Fusion:

The GCP framework supports arbitrary elementwise losses matching the data distribution (e.g., binary, count, positive, or robust losses), broadening applicability to real-world and non-Gaussian settings (Hong et al., 2018).

5. Theoretical Advantages, Limitations, and Innovations

Representational Efficiency:

Storage complexities are reduced from exponential to linear or low-polynomial in tensor order for chain and ring structures (Zhao et al., 2016, Zhou et al., 15 Mar 2024), with block or star couplings further mitigating rank bottlenecks. Fully-connected and star topologies (FCTN, TS) achieve direct mode correlations and transpositional invariance, at the expense of greater number of small cores but without exponential growth (Zhou et al., 15 Mar 2024).

Permutation Invariance:

Cyclic (TR, TS) and fully-connected (FCTN) networks eliminate the dependence on the order of tensor modes—an issue for sequential models. This invariance simplifies modeling when the relative importance/order of data modes is not known a priori, or is expected to shift.

Flexible Rank Assignment:

Recent designs (e.g., TS) allow different latent ranks on different mode connections, addressing variable correlation strengths and heterogeneous data dimensions, termed the "curse of ranks" (Zhou et al., 15 Mar 2024).

Implicit Regularization and Nonlinearity:

Deep generative networks (e.g., DeepTensor) as priors for factor tensors confer implicit regularization, enable robust recovery under non-Gaussian noise, and permit decomposition models that go well beyond classical multilinear factorization (Saragadam et al., 2022, Liu et al., 2016).

Scalability and Algorithmic Adaptation:

ALS, sequential SVD, and block-update schemes enable decompositions for extremely high-dimensional and incomplete data due to their tractable per-iteration costs and scalable gradient computations, especially when underpinned by efficient MTTKRP or TT-contraction kernels (Phan et al., 2016, Hong et al., 2018).

Limitations:

FCTN and related fully-connected models still suffer from parameter growth as a power of tensor order, limiting their practicality for extremely high-orders unless mitigated with latent dimension reduction (as in LMTN) (Yang et al., 2022). Optimization landscapes are non-convex, and the choice of rank parameters remains application-dependent. Some models (GCP, DeepTensor) provide remedies via adaptive losses or implicit priors.

6. Key Mathematical Formulations

Decomposition	Core Formula	Invariance Property
TT	$\mathcal{X}(i_1,\ldots,i_d) = G_1(i_1)\cdots G_d(i_d)$	Sequential (mode order-dependent)
TR	$\mathcal{X}(i_1,\ldots,i_d) = \mathrm{Tr}\{ Z_1(i_1)\cdots Z_d(i_d) \}$	Permutation-invariant under circular shifts
FCTN/TS	See below	Full mode permutation invariance or transpositional invariance
GCP	$F(M; X) = \sum_{i_1,\ldots,i_d} f(x_{i_1,\ldots,i_d}, m_{i_1,\ldots,i_d})$	Data-type-adaptive, not topology-based
LMTN	$\mathcal{X} = \mathrm{FCTN}(\{ \mathcal{G}_n \}) \times_1 M_1 \times_2 M_2 \ldots \times_N M_N$	Retains key FCTN invariance, reduces parameter count

The TS decomposition (Zhou et al., 15 Mar 2024), for an order $N$ tensor, can be written as: $\mathcal{X}(i_1, \ldots, i_N) = \sum \mathcal{G}_1(r_{1,1}, i_1, r_{1,2})\, \mathcal{C}_1(r_{1,2}, l_1, l_2, r_{2,1})\, \cdots\, \mathcal{G}_N(r_{N,1}, i_N, r_{N,2})\, \mathcal{C}_N(r_{N,2}, l_N, l_1, r_{1,1})$ with contractions over the latent indices $r_{k,1/2}$ and $l_k$ (cyclically connected).

7. Outlook and Research Directions

Active research areas include: control of curse of ranks and curse of dimensionality via star, hybrid and semi-tensor product structures (Zhou et al., 15 Mar 2024, Zhao et al., 2021); adaptive algorithm design for large-scale, higher-order, and incomplete tensors (Yang et al., 2022); implicit regularization via deep network-based priors (Saragadam et al., 2022, Liu et al., 2016); data privacy and federated learning via randomized or distributed tensor cores (Ong et al., 2021, Ong et al., 2018); and the development of efficient software toolkits for practical deployment (e.g., TedNet (Pan et al., 2021)). Generalization to non-Gaussian loss frameworks (GCP (Hong et al., 2018)) and interpretable factor discovery in application domains (remote sensing, biomedical imaging, quantum simulation) underscore the versatility of tensor network decompositions.

These advances continue to expand the landscape of multiway data analysis, offering principled frameworks for model reduction, efficient computation, privacy, and statistical inference across the spectrum of modern scientific and engineering problems.