Four-Dimensional Tucker Interaction Tensor

Updated 12 December 2025

The four-dimensional Tucker interaction tensor is a multilinear, low-rank structure that captures latent interactions across four distinct modes.
It employs a Tucker decomposition framework with mode-specific embeddings and a core tensor to enable dimensionality reduction and interpretability.
Its optimized construction and regularization techniques yield enhanced performance in tasks like turbulence regression, neuroimaging, and multilayer network analysis.

A four-dimensional Tucker interaction tensor is a multilinear, low-rank tensorial structure central to contemporary multidimensional data analysis, capturing higher-order interactions in applications such as turbulence regression, neuroimaging, multilayer networks, and structured neural models. In its canonical form, it represents latent structured relationships spanning four discrete (or continuous) axes, allowing efficient parameterization, dimensionality reduction, and interpretability of complex joint dependencies. The construction, optimization, and theoretical properties of such a tensor are foundational to the class of Tucker decomposition models and their modern extensions, including nonnegative/orthogonal variants and neural-based interaction layers.

1. Mathematical Definition and Core Construction

A four-dimensional Tucker interaction tensor $\mathcal{T} \in \mathbb{R}^{Q_1 \times Q_2 \times Q_3 \times Q_4}$ is a four-way tensor capturing structured interactions among four separate modes (indices or variables). It is generated from a set of mode-wise embedding or basis vectors. For discrete mode indices $(p,i,j,k)$ and corresponding embedding lookup matrices $E_h, E_u, E_v, E_w$ , the interaction tensor is

$\mathcal{T}_{pijk} = \mathbf{e}^{(p)} \circ \mathbf{e}^{(i)} \circ \mathbf{e}^{(j)} \circ \mathbf{e}^{(k)}$

where $\circ$ denotes the outer product and each $\mathbf{e}^{(\cdot)}$ is a mode-specific embedding vector in $\mathbb{R}^{Q_m}$ , $m = 1,2,3,4$ (Fan et al., 5 Dec 2025). The tensor thus occupies $Q_1 Q_2 Q_3 Q_4$ degrees of freedom, structurally encoding all multiway latent interactions between the modes.

A general four-way Tucker decomposition of a data tensor $\mathcal{Y}\in\mathbb{R}^{P \times I \times J \times K}$ is given by

$\mathcal{Y} \approx \mathcal{G} \times_1 \mathbf{A} \times_2 \mathbf{B} \times_3 \mathbf{C} \times_4 \mathbf{D}$

with $\mathcal{G}$ as the Tucker core and $\mathbf{A},\mathbf{B},\mathbf{C},\mathbf{D}$ as the low-dimensional mode factor matrices. In the context of neural models, this process is often collapsed via embedding lookup and neural contraction (vectorization followed by a linear map) (Fan et al., 5 Dec 2025).

2. Model Motivation and Data Discretization

The four-dimensional Tucker interaction tensor is essential for modeling joint dependencies in settings where data naturally vary over four axes. For instance, in turbulence regression, the four modes correspond to discretized bins of elevation ( $h$ ) and three wind velocity components ( $u$ , $v$ , $w$ ). Raw continuous data are standardized and discretized modewise using quantile binning to equip the tensor with integer-valued indices suitable for embedding-based constructions: $\mathcal{Z} = \mathcal{D}(\mathcal{Y}) \in \{0,\dots,K_h\}^{|H|}\times\{0,\dots,K_u\}^{|U|}\times\{0,\dots,K_v\}^{|V|}\times\{0,\dots,K_w\}^{|W|}$ where $\mathcal{D}$ denotes discretization (Fan et al., 5 Dec 2025). This preprocessing transforms continuous fields into a form amenable to lookup and interaction tensor construction, facilitating scalable representation learning and downstream optimization.

3. Low-Rank Tucker Factorization and Neural Implementation

The classical Tucker factorization posits that the observed tensor can be efficiently approximated via contraction of a small core tensor and mode-specific factor matrices. The neural formulation leverages embeddings:

Each mode index (after discretization) is mapped via an embedding matrix,
The four resulting vectors are combined via an outer product,
The tensor is vectorized and contracted via a learned linear map—equivalent to flattening the Tucker core,
The prediction is regularized and passed through a nonlinearity (typically a sigmoid for regression tasks constrained to a normalized range) (Fan et al., 5 Dec 2025).

The essential computation is: $\hat{y}_{pijk} = \sigma \left( W^\mathsf{T} \mathrm{vec}(\mathcal{T}_{pijk}) \right)$ with $W$ the vectorized core and $\sigma$ an appropriate activation. This mechanism, termed a "Tucker interaction layer," generalizes to arbitrary order and allows end-to-end learning with stochastic optimization (e.g., Adam optimizer, regularization on weights and embeddings).

4. Optimization, Regularization, and Complexity

Learning the four-dimensional Tucker model involves minimizing a regularized objective over observed entries, incorporating both mean squared error on targets and $\ell_2$ penalties on latent parameters: $\mathcal{L} = \frac{1}{N} \sum_{(p,i,j,k)\in\mathcal{O}} (y_{pijk} - \hat y_{pijk})^2 + \lambda_G \|W\|_2^2 + \lambda_E \sum_{m=1}^4 \|E_m\|_F^2,$ where $\mathcal{O}$ indexes the set of observed tuples, $N = |\mathcal{O}|$ , and $\lambda_G, \lambda_E$ are hyperparameters (Fan et al., 5 Dec 2025). Training is typically performed via mini-batch stochastic optimization with early stopping.

The primary computational burden lies in the size of the core and the embeddings:

Embedding lookups scale as $O(r)$ per mode,
The outer product produces a size- $r^4$ tensor, which (for moderate $r$ ) is tractable,
Regularization counteracts overfitting and stabilizes learning, especially in highly sparse or high-dimensional regimes.

5. Theoretical and Empirical Properties

The four-dimensional Tucker interaction tensor inherits the theoretical properties of Tucker decomposition:

It is statistically identifiable up to invertible linear transforms per mode (gauge indeterminacy), resolved by orthogonality or normalization constraints (Li et al., 2013).
With nonnegativity and/or sparsity in the core, uniqueness can be strengthened, and the model becomes more interpretable, yielding "parts-based" representations (Zhou et al., 2014, Pan et al., 2019).
Empirically, moderate Tucker ranks suffice for accurate recovery and prediction, with the rank selection driven by cross-validation or BIC-type criteria (Fan et al., 5 Dec 2025, Li et al., 2013).

Empirical results for the four-dimensional interaction tensor in turbulence regression show substantial improvements in error (MAE, RMSE) and coefficient of determination ( $R^2$ ) over alternatives such as MLPs and pairwise/triplewise interaction models, demonstrating the practical advantage of capturing all fourth-order latent interactions (Fan et al., 5 Dec 2025).

6. Comparison with Alternative Factorization Models

The four-dimensional Tucker model generalizes and extends alternative tensor factorization approaches:

CP (CANDECOMP/PARAFAC) decomposition constrains the core to a super-diagonal, enforcing a single shared rank across modes and limiting flexibility (Li et al., 2013).
Tucker allows for mode-specific ranks and a full interaction core, offering superior parsimony, fit, and empirical performance when the intrinsic cross-mode complexity varies.
Neural variants ("Tucker interaction layers") further enhance expressivity, integrating learned embeddings and nonlinearities.

Extensions such as orthogonal and nonnegative Tucker decompositions (ONTD, NNTuck) introduce additional constraints to promote interpretability, uniqueness, and statistical consistency, with specific algorithms (e.g., ADMM, multiplicative updates) to guarantee convergence (Pan et al., 2019, Aguiar et al., 2022).

7. Applications and Practical Usage

The four-dimensional Tucker interaction tensor is applied in diverse domains:

Turbulence regression: Captures all spatio-temporal interactions among discretized elevation and three wind components, outperforming both shallow and deep baselines on wind-field radar data (Fan et al., 5 Dec 2025).
Neuroimaging: Encodes complex relationships among multidimensional brain image covariates within a supervised GLM framework, yielding scalable and theoretically sound regression estimators (Li et al., 2013).
Structured neural models: Enables the construction of highly expressive, parameter-efficient recursive architectures for trees and graphs by representing multiway child-parent interactions (Castellana et al., 2020).
Multilayer networks: Models higher-order interdependencies among layers in multilayer adjacency tensors, generalizing stochastic block models via nonnegative Tucker decomposition (Aguiar et al., 2022).

Practical rank selection leverages both heuristic (singular value decay) and formal (cross-validation, BIC) strategies. Implementation exploits tensor-matrix product sparsity, parallelization, and, when needed, embedding-based downsizing and regularization for stability.

The four-dimensional Tucker interaction tensor formalizes all multi-mode latent interactions in computationally and statistically efficient models, underpinning key advances in multilinear regression, deep structured learning, compressed sensing, and interpretable latent factor discovery across a range of high-dimensional data analysis domains (Li et al., 2013, Fan et al., 5 Dec 2025, Zhou et al., 2014, Pan et al., 2019, Castellana et al., 2020, Aguiar et al., 2022).