Tucker Decomposition in Multilinear Models

Updated 29 November 2025

Tucker decomposition is a multilinear algebra method that factorizes high-order tensors into a core tensor and mode-specific factor matrices.
It enables flexible low-rank representations, used for model compression, capturing multiway correlations, and improving image classification performance.
The method offers depth efficiency with exponential CP-rank separation while balancing parameter scalability against computational overhead.

Tucker decomposition is a central methodology in multilinear algebra that factorizes a high-order tensor into a small core tensor and mode-specific factor matrices, yielding flexible low-rank representations especially adapted to multidimensional data analysis and machine learning. The decomposition is widely applied for model compression, capturing multiway correlations, and constructing expressive architectures in computer vision, statistics, and signal processing.

1. Formal Definition and Algebraic Structure

Given an $N$ -way tensor $X \in \mathbb{R}^{I_1 \times I_2 \times \cdots \times I_N}$ , its Tucker decomposition of multilinear rank $(J_1, ..., J_N)$ is:

$X \approx \mathcal{G} \times_1 U^{(1)} \times_2 U^{(2)} \cdots \times_N U^{(N)}$

with core tensor $\mathcal{G} \in \mathbb{R}^{J_1 \times J_2 \times \cdots \times J_N}$ and factor matrices $U^{(n)} \in \mathbb{R}^{I_n \times J_n}$ for $n = 1, ..., N$ (Liu et al., 2019). Elementwise, this reads

$X_{i_1,\ldots,i_N} \approx \sum_{j_1=1}^{J_1} \cdots \sum_{j_N=1}^{J_N} g_{j_1\cdots j_N} \prod_{n=1}^N U^{(n)}_{i_n, j_n}$

providing a highly expressive multilinear model. Tucker generalizes matrix SVD to higher orders and is strictly more flexible than CP decomposition.

2. Tucker-Decomposition Network Architecture

The Tucker network formalism models multi-patch input (e.g., object composed of $N$ subpatches $x_1, ..., x_N \in \mathbb{R}^s$ ) as follows (Liu et al., 2019):

Representation layer: $f_\theta: \mathbb{R}^s \rightarrow \mathbb{R}^M$ (e.g., ReLU-activated convolution), $M$ features per patch.
Mode-wise projection: For each mode $i$ , project features via $U^{(i)} \in \mathbb{R}^{M \times J_i}$ to $v^{(i)} \in \mathbb{R}^{J_i}$ .
Core-tensor layer: Class scores $F_y = \langle \mathcal{G}^y, v^{(1)} \circ v^{(2)} \circ \cdots \circ v^{(N)} \rangle$ , with class-wise $N$ -way cores.
Depth: One representation layer, $N$ mode-wise projections, a product-pooling layer, and final linear classification.

This network is strictly deeper than shallow CP architectures and can be recursively deepened by hierarchical nesting.

3. Expressive Power: Depth Separation Theorem

The main theorem states that for any $N$ -way Tucker tensor generated with uniform rank $J$ , the CP-rank is exponentially larger: For $N$ even, the CP-rank is generically at least $J^{N/2}$ ; for $N$ odd, at least $J^{(N-1)/2}$ (Liu et al., 2019). More precisely, emulating a Tucker block with a single-layer CP (shallow) network requires exponentially many neurons:

$\text{CP-rank}(A^y) \geq \max_{(p,q)} \text{rank}(G_{(p,q)})$

where $G_{(p,q)}$ is the $(p,q)$ matricization of the core.

Proof sketch: CP-rank lower-bounds the rank of all matricizations. For full-rank mode-wise factors, Tucker maintains those ranks, and a generic core achieves maximal rank except on measure-zero sets. Thus, the exponential gap is ubiquitous unless the core is degenerate.

4. Comparison to Hierarchical Tensor Networks

HT formats (order- $2^L$ ) arrange modes in binary-tree structure, using small cores at internal nodes and shared leaf factors. Theoretically, any HT decomposition of order $2^L$ can be rewritten as a Tucker decomposition and vice versa. Tucker networks are as "rich" as HT for the same leaf rank, but HT typically allocates parameters across $\mathcal{O}(N)$ small cores, reducing core size from $J^N$ to polynomial quantities (Liu et al., 2019). Tucker's single block simplicity is advantageous, but HT may be preferable for high $N$ due to parameter scaling.

5. Parameter and Computational Complexity

Given mode projections $M \rightarrow J$ , the parameter counts are:

Shallow CP: $\Theta_\text{CP} = Y \cdot Z + N M Z$
Tucker: $\Theta_\text{Tucker} = Y J^N + N M J$
HT: $\Theta_\text{HT} = Y \cdot [r_{L-1}^2 \cdot 2 + r_{L-2}^2 \cdot 4 + \cdots ] + N M r_0$

Contraction of $J^N$ outer product vectors is handled by specialized product-pooling; in practice, $J \ll M,Z$ (Liu et al., 2019).

6. Empirical Validation in Image Classification

Experiments use TensorFlow, batch normalisation, and SGD on MNIST and CIFAR-10 datasets. Tucker, HT, and shallow CP architectures are matched for parameter count ( $\sim 3.8$ K MNIST, $23$K CIFAR-10). Metrics measured include test accuracy, parameter sensitivity, and convergence.

Key results (Liu et al., 2019):

Tucker networks outperform both HT and shallow on training and test accuracy.
Tucker achieves $\sim 99\%$ test accuracy on MNIST faster and higher than alternatives.
For CIFAR-10, best Tucker networks consistently exceed HT and shallow by several percentage points at fixed model sizes.
The optimal Tucker rank varies by dataset but always leads.

7. Practical Implications, Advantages, and Limitations

Advantages:

Mode-wise rank control enables flexible model size–expressivity tradeoffs.
Depth efficiency: exponential separation from shallow CP; theoretical guarantees.
Single-block architecture is easier to operationalize in deep learning frameworks.
Empirical superiority at constrained parameter budgets in computer vision.

Limitations:

Core size grows as $J^N$ ; practical when $N$ moderate, $J$ small.
Lack of spatial sharing may hamper hierarchical feature extraction in very deep nets.
The final product-pooling step incurs computational overhead for large $J$ and $N$ .

Overall, Tucker-Decomposition Networks occupy an intermediate point between CP and HT designs, providing depth efficiency, parameter flexibility, and practical performance (Liu et al., 2019). Future extensions may further explore sparsity, sharing mechanisms, or hierarchical interleaving to extend Tucker principles to larger-scale architectures.

Markdown Report Issue Upgrade to Chat

References (1)

Tucker Decomposition Network: Expressive Power and Comparison (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tucker Decomposition.

Tucker Decomposition in Multilinear Models

1. Formal Definition and Algebraic Structure

2. Tucker-Decomposition Network Architecture

3. Expressive Power: Depth Separation Theorem

4. Comparison to Hierarchical Tensor Networks

5. Parameter and Computational Complexity

6. Empirical Validation in Image Classification

7. Practical Implications, Advantages, and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Tucker Decomposition in Multilinear Models

1. Formal Definition and Algebraic Structure

2. Tucker-Decomposition Network Architecture

3. Expressive Power: Depth Separation Theorem

4. Comparison to Hierarchical Tensor Networks

5. Parameter and Computational Complexity

6. Empirical Validation in Image Classification

7. Practical Implications, Advantages, and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research