Bayesian Robust Tensor Ring (BRTR)
- BRTR is a fully Bayesian method applied in tensor ring models, leveraging hierarchical priors for robust tensor completion and automatic rank determination.
- It integrates heavy-tailed Student‑t priors with variational Bayesian inference, enabling effective recovery even under extreme missingness and corruption.
- The adaptive rank pruning mechanism in BRTR offers efficient, data-driven tensor decomposition, as validated across imaging, video, and hyperspectral applications.
The Bayesian Robust Tensor Ring (BRTR) is a fully Bayesian framework for robust tensor completion and factorization under the tensor ring (TR) model. It integrates hierarchical probabilistic priors and variational inference to enable simultaneous, automatic determination of tensor ring rank, robust estimation in the presence of outliers and missing data, and avoidance of manual hyperparameter tuning. BRTR advances beyond prior TR-based approaches—whose recovery performance often degrades under extreme missingness or corruption and which require manual selection of model ranks—by incorporating heavy-tailed priors and relevance-driven rank pruning. Core technical contributions include a generative model for incomplete and corrupted multiway arrays using TR decomposition, a fully tractable variational Bayesian learning algorithm, and robust empirical performance across wide-ranging imaging, video, and scientific data domains (Long et al., 2020, Huang et al., 2022).
1. Probabilistic Generative Model
In the BRTR framework, observed entries of an -th order tensor $\Y \in \mathbb{R}^{I_1\times\cdots\times I_N}$ are modeled as the noisy sum of a low-rank TR tensor $\L$ and a sparse outlier tensor , with Gaussian noise $\M$ supported on the observed indices . The model is
$\Y_{i_1\cdots i_N} = \L_{i_1\cdots i_N} + \S_{i_1\cdots i_N} + \M_{i_1\cdots i_N},\qquad \M_{i_1\cdots i_N}\sim \mathcal{N}(0, \tau^{-1}),\; (i_1, \ldots, i_N) \in \Omega.$
The low-rank component $\L$ is parameterized in TR form by core tensors , with :
$\L_{i_1\cdots i_N} = \mathrm{Tr} \left(\Z^{(1)}(i_1) \Z^{(2)}(i_2) \cdots \Z^{(N)}(i_N)\right)$
where denotes the -th lateral slice of . The likelihood for observed entries becomes
$p\left(\mathcal{P}_\Omega(\Y)\mid\{\Z^{(n)}\}, \S, \tau\right) = \prod_{(i_1,\ldots,i_N)\in\Omega} \mathcal{N} \left(\Y_{i_1\cdots i_N}\mid \mathrm{Tr}\left(\prod_{n=1}^N \Z^{(n)}(i_n)\right) + \S_{i_1\cdots i_N},\ \tau^{-1}\right).$
Hierarchical priors are assigned to all factor and noise parameters. Each entry of is zero-mean Gaussian with rank-wise factorized precision governed by Gamma hyperpriors; has entrywise zero-mean Gaussian prior with Gamma-precision; is globally Gamma-distributed. The marginal prior on each core-factor entry is a heavy-tailed Student- due to this Gaussian–Gamma hierarchy, enabling sparsity and robustness (Long et al., 2020, Huang et al., 2022).
2. Joint Distribution and Posterior Formulation
The complete latent variable set comprises , with all priors and generative mechanisms defined as above. The full joint density is
$p(\Y, \mathcal{Z}) = p(\Y \mid \{\Z^{(n)}\}, \S, \tau)\, \prod_{n=1}^N p(\Z^{(n)} \mid u^{(n-1)}, u^{(n)})\, p(u^{(n)}) \, p(\S \mid \eta)\, p(\eta)\, p(\tau).$
Posterior inference, $p(\mathcal{Z} \mid \Y)$, is intractable in closed form due to the intricate dependencies and the nonconjugacy induced by TR contraction and Student- marginalization. BRTR employs a mean-field variational Bayesian approximation to the posterior by maximizing the evidence lower bound (ELBO) (Huang et al., 2022).
3. Variational Bayesian Inference Mechanism
A mean-field approximation factorizes the posterior as
Each term admits a closed-form update:
- : Each lateral slice is Gaussian or matrix-normal, with posterior precision comprising observed data contributions (-weighted covariance of complements) and relevance (-weighted prior).
- : Each TR rank-precision remains Gamma; updates depend on the trace of corresponding factor-covariances plus number of observed slices.
- , : Sparse entries and their precisions each have Gaussian-Gamma updates conditioned on current estimates of low-rank and noise parameters.
- : Global noise precision is Gamma, with parameters set by residuals of observed entries minus low-rank and sparse reconstructions.
The algorithm iterates through component updates until the ELBO converges. An explicit “relevance-driven” rank adaptation is implemented at each step as detailed below (Huang et al., 2022, Long et al., 2020).
4. Automatic Tensor Ring Rank Pruning
BRTR performs fully Bayesian automatic model selection via component-wise precision hyperpriors. The expected precision for each rank component increases as its associated factor variance shrinks under the posterior. Once exceeds a predefined (large) threshold, the corresponding component is pruned—i.e., the -th slice is deleted from both and . This procedure yields adaptive, data-driven TR ranks without manual or cross-validated setting (Huang et al., 2022, Long et al., 2020).
| Component | Prior Type | Pruning Mechanism |
|---|---|---|
| TR core-tensor slices | Gaussian-Gamma (Student-) | Prune if |
| Sparse outlier () components | Gaussian-Gamma | Shrunk by posterior |
| Noise () | Gamma | Not pruned, posterior adapts |
Automatic rank pruning under BRTR provides TR decompositions of minimum necessary dimension for the observed signal structure, improving generalization (Long et al., 2020).
5. Robustness, Outlier Modeling, and Statistical Properties
Marginalizing the precision priors, core-factor entries are governed by a Student- prior, providing heavy-tailed behavior: the density decays as , inducing both strong shrinkage of small coefficients (sparsity) and tolerance for occasional large entries (outliers). This dual behavior enables reliable modeling of low-rank structure even when $\Y$ is contaminated by large-magnitude corruptions and/or missing data. Sparse outlier entries are estimated alongside low-rank factors, so structured outliers (e.g., moving objects in video separation) are captured in while background structure is assigned to $\L$. The Student- prior admits local “spikes” absorbed into factors without biasing the global fit (Huang et al., 2022, Long et al., 2020).
6. Algorithmic Complexity and Practical Implementation
The computational complexity per variational iteration is , where is tensor order, is the number of observed entries, and is the (maximum) TR rank. The most expensive steps arise from core-tensor updates, whose cost scales poorly for high tensor orders or large TR rank. Implementation thus benefits from parallelization, crude initializations, and potential extensions to stochastic or spectral-compressed inference schemes. Pruning and relevance detection impart further efficiency by adaptively shrinking model size during inference (Huang et al., 2022).
7. Empirical Evaluation and Limitations
Extensive experiments on synthetic data, color images, hyperspectral cubes, face datasets, and surveillance videos validate BRTR’s empirical performance. Highlights include:
- On synthetic tensors with missing data (MR) and sparse corruption, BRTR accurately recovers both low-rank structure and outlier support, and correctly identifies the true underlying TR rank, outperforming RTRC and BRTF in rank estimation error (REE) and recovery error (RSE).
- On image completion (MR up to 90%, SR up to 10%), BRTR achieves the lowest RSE and highest PSNR among all compared methods across several natural images.
- On hyperspectral denoising, BRTR dominates in RSE/PSNR metrics under varying missing and corruption rates.
- On YaleB face recovery and video background separation tasks, BRTR cleanly separates low-rank (background) and sparse (foreground) components, with graceful performance degradation as corruption increases (Huang et al., 2022, Long et al., 2020).
Empirical comparisons demonstrate resilience to mixed missing and corrupted data, precision in rank adaptation, and overall robustness. Limitations include unfavorable scaling for large-scale, high-order tensors, sensitivity to initialization under variational inference (local optima), and the use of fully factorized priors for —suggesting that explicitly modeling spatial/spectral correlations may be beneficial. Pruning thresholds must be set sufficiently large to avoid premature component deletion (Huang et al., 2022).
For technical details and algorithmic pseudocode, see (Long et al., 2020) and (Huang et al., 2022).