Bayesian ALS for Tensor Decomposition

Updated 16 July 2025

Bayesian ALS is a probabilistic method for tensor decomposition that computes posterior distributions for each factor, enabling uncertainty quantification.
It uses sequential block coordinate updates to integrate prior knowledge and noise modeling, leading to more robust low-rank approximations.
The algorithm leverages tensor train formats and the unscented transform to efficiently propagate uncertainty, making it suitable for streaming and large-scale data.

A Bayesian ALS (Alternating Least Squares) algorithm is a probabilistic generalization of the classic ALS method for low-rank tensor approximation, recast within a Bayesian inference framework. Rather than seeking only point estimates for tensor decomposition components, the Bayesian version computes posterior distributions for each factor, thereby enabling uncertainty quantification and principled incorporation of prior knowledge and noise. This approach is particularly important in high-dimensional, noisy, or ill-posed multiway data analysis tasks and leverages modern tensor network formats such as the tensor train (TT) to maintain scalability (Menzen et al., 2020).

1. Bayesian Reformulation of Alternating Least Squares

The Bayesian ALS algorithm models the problem of low-rank tensor decomposition by associating each decomposition component with a Gaussian prior distribution. Consider a tensor $\mathcal{Y}$ with a low-rank representation via tensor train decomposition:

$\mathcal{Y} \approx f_T(g_1, g_2, ..., g_N),$

where $g_i$ denotes the $i$ -th decomposition factor. For each $g_i$ , a prior is assumed:

$p(g_i) = \mathcal{N}(m_i^0, P_i^0),$

where $m_i^0$ and $P_i^0$ encode prior mean and covariance, respectively. Measurement noise in the observed data $y$ is modeled as Gaussian.

With independence assumed between factors, the joint posterior given the observed data $y$ is:

$p(g_1, ..., g_N \mid y) \propto p(y \mid g_1, ..., g_N) \prod_{i = 1}^N p(g_i).$

Due to the multilinear nature of the tensor representation, the likelihood conditional on $g_n$ (holding others fixed) is linear:

$m_y = U_{(-n)} g_n,$

where $U_{(-n)}$ is constructed from all other factors.

The conditional posterior for $g_n$ (with Gaussian likelihood and prior) is also Gaussian:

$p(g_n \mid \{g_i\}_{i \neq n}, y) = \mathcal{N}(m_n^+, P_n^+),$

with updated mean and covariance given by

$m_n^+ = \left[(P_n^0)^{-1} + U_{(-n)}^\top U_{(-n)}/\sigma^2\right]^{-1} \left(U_{(-n)}^\top y/\sigma^2 + (P_n^0)^{-1} m_n^0\right), \ P_n^+ = \left[(P_n^0)^{-1} + U_{(-n)}^\top U_{(-n)}/\sigma^2\right]^{-1}.$

When the prior is uninformative ( $(P_n^0)^{-1} \to 0$ ), these equations reduce to the standard ALS normal equations.

2. Probabilistic Interpretation and Sequential Updates

Each update in the Bayesian ALS algorithm not only produces a mean estimate of each factor but also its covariance, providing an explicit measure of uncertainty. The algorithm proceeds via block coordinate descent:

At each step, hold all but one factor fixed.
Update the posterior mean and covariance for the current factor using the above equations.
Iterate until convergence.

This strategy allows seamless integration of measurement noise and prior information. When confident prior knowledge exists (small $P_n^0$ ), updates are regularized toward the prior; with noninformative priors (large $P_n^0$ ), data-driven updates dominate.

This block-wise Bayesian update framework supports recursive updating and is well suited to sequential or streaming data scenarios (Menzen et al., 2020).

3. Uncertainty Propagation: Unscented Transform in Tensor Train Format

A central challenge in Bayesian tensor estimation is propagating uncertainty from the distribution of factors to the reconstructed tensor, as the mapping is nonlinear. The algorithm applies the unscented transform (UT) within the TT format:

Gather the posterior means and covariances of all factors into $m$ and block-diagonal $P$ .
Generate $2M + 1$ sigma points $\{x^{(i)}\}$ ,

$x^{(0)} = m, \ x^{(i)} = m + \sqrt{M + \lambda} [\sqrt{P}]_i, \ x^{(i + M)} = m - \sqrt{M + \lambda} [\sqrt{P}]_i,$

with weights $w_i^m, w_i^P$ as per the UT specification.

For each sigma point, reconstruct a tensor via the TT mapping $S^{(i)} = f_T(x^{(i)})$ .
Compute mean and covariance:

$m_{UT} = \sum_{i = 0}^{2M} w_i^m S^{(i)}, \ P_{UT} = \sum_{i = 0}^{2M} w_i^P (S^{(i)} - m_{UT})(S^{(i)} - m_{UT})^\top.$

This formulation allows efficient estimation of the mean and covariance of the entire low-rank approximation, without explicitly materializing the full (often gigantic) covariance matrix.

4. Practical Benefits: Noise, Priors, and Scalability

The Bayesian ALS framework naturally incorporates noise modeling and prior information. This is valuable in applications with noisy data, where uncertainty quantification is critical for interpretability and robustness. The algorithm is capable of recursive updating, enabling application to streaming or time-varying data. Its exploitation of tensor network (TT) representations ensures scalability; the main computational costs are associated with:

Matrix inversions (manageable for moderate factor dimensions),
Sigma point generation and propagation in TT format.

By propagating posterior covariances through the TT mapping, the approach avoids storage and computation overhead associated with full-tensor covariance matrices.

5. Applications and Implications

The Bayesian ALS algorithm is applicable wherever low-rank tensor decompositions are used and uncertainty quantification is needed. Key domains include:

Image and signal processing: providing both low-rank approximations and confidence intervals, demonstrated, for example, by reconstructing noisy image data and reporting corresponding uncertainty measures (Menzen et al., 2020).
Online or recursive estimation for time-varying or streaming tensors, owing to the sequential nature of Bayesian updates.
Large-scale multiway data analysis in machine learning, neuroscience, and system identification, where interpretability and prior information play a crucial role.

Uncertainty quantification from this method can guide stopping criteria, inform downstream analyses, and enable risk-aware decision making.

In the limit of noninformative priors and absence of noise, the Bayesian ALS update equations revert to those of standard ALS. However, the Bayesian approach distinguishes itself by:

Quantifying uncertainty associated with each decomposition component.
Allowing modelers to encode domain expertise through prior distributions.
Facilitating principled noise handling and recursive updating.
Offering estimates of the overall uncertainty in the reconstructed tensor, rather than just point estimates.

These features address several well-known limitations of classical ALS, especially for noisy, under-determined, or high-dimensional tensor approximation problems.

7. Summary

The Bayesian ALS algorithm generalizes alternating least squares tensor decomposition to a fully probabilistic framework. Through sequential block-wise Bayesian updates of factor distributions and uncertainty propagation via the unscented transform in tensor train format, the approach provides point estimates, uncertainty quantification, and accommodation of both noise and prior domain knowledge. Its recursive update structure and compatibility with TT formats render it practical for modern large-scale, noisy, and time-evolving multiway data problems, marking a significant conceptual and practical advance over traditional ALS-based low-rank approximation methods (Menzen et al., 2020).

PDF Markdown Chat (Pro)

References (1)

Alternating linear scheme in a Bayesian framework for low-rank tensor approximation (2020)

Follow Topic

Get notified by email when new papers are published related to Bayesian ALS Algorithm.