Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 88 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 12 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 110 tok/s Pro
GPT OSS 120B 470 tok/s Pro
Kimi K2 197 tok/s Pro
2000 character limit reached

Bayesian ALS for Tensor Decomposition

Updated 16 July 2025
  • Bayesian ALS is a probabilistic method for tensor decomposition that computes posterior distributions for each factor, enabling uncertainty quantification.
  • It uses sequential block coordinate updates to integrate prior knowledge and noise modeling, leading to more robust low-rank approximations.
  • The algorithm leverages tensor train formats and the unscented transform to efficiently propagate uncertainty, making it suitable for streaming and large-scale data.

A Bayesian ALS (Alternating Least Squares) algorithm is a probabilistic generalization of the classic ALS method for low-rank tensor approximation, recast within a Bayesian inference framework. Rather than seeking only point estimates for tensor decomposition components, the Bayesian version computes posterior distributions for each factor, thereby enabling uncertainty quantification and principled incorporation of prior knowledge and noise. This approach is particularly important in high-dimensional, noisy, or ill-posed multiway data analysis tasks and leverages modern tensor network formats such as the tensor train (TT) to maintain scalability (Menzen et al., 2020).

1. Bayesian Reformulation of Alternating Least Squares

The Bayesian ALS algorithm models the problem of low-rank tensor decomposition by associating each decomposition component with a Gaussian prior distribution. Consider a tensor Y\mathcal{Y} with a low-rank representation via tensor train decomposition:

YfT(g1,g2,...,gN),\mathcal{Y} \approx f_T(g_1, g_2, ..., g_N),

where gig_i denotes the ii-th decomposition factor. For each gig_i, a prior is assumed:

p(gi)=N(mi0,Pi0),p(g_i) = \mathcal{N}(m_i^0, P_i^0),

where mi0m_i^0 and Pi0P_i^0 encode prior mean and covariance, respectively. Measurement noise in the observed data yy is modeled as Gaussian.

With independence assumed between factors, the joint posterior given the observed data yy is:

p(g1,...,gNy)p(yg1,...,gN)i=1Np(gi).p(g_1, ..., g_N \mid y) \propto p(y \mid g_1, ..., g_N) \prod_{i = 1}^N p(g_i).

Due to the multilinear nature of the tensor representation, the likelihood conditional on gng_n (holding others fixed) is linear:

my=U(n)gn,m_y = U_{(-n)} g_n,

where U(n)U_{(-n)} is constructed from all other factors.

The conditional posterior for gng_n (with Gaussian likelihood and prior) is also Gaussian:

p(gn{gi}in,y)=N(mn+,Pn+),p(g_n \mid \{g_i\}_{i \neq n}, y) = \mathcal{N}(m_n^+, P_n^+),

with updated mean and covariance given by

mn+=[(Pn0)1+U(n)U(n)/σ2]1(U(n)y/σ2+(Pn0)1mn0), Pn+=[(Pn0)1+U(n)U(n)/σ2]1.m_n^+ = \left[(P_n^0)^{-1} + U_{(-n)}^\top U_{(-n)}/\sigma^2\right]^{-1} \left(U_{(-n)}^\top y/\sigma^2 + (P_n^0)^{-1} m_n^0\right), \ P_n^+ = \left[(P_n^0)^{-1} + U_{(-n)}^\top U_{(-n)}/\sigma^2\right]^{-1}.

When the prior is uninformative ((Pn0)10(P_n^0)^{-1} \to 0), these equations reduce to the standard ALS normal equations.

2. Probabilistic Interpretation and Sequential Updates

Each update in the Bayesian ALS algorithm not only produces a mean estimate of each factor but also its covariance, providing an explicit measure of uncertainty. The algorithm proceeds via block coordinate descent:

  • At each step, hold all but one factor fixed.
  • Update the posterior mean and covariance for the current factor using the above equations.
  • Iterate until convergence.

This strategy allows seamless integration of measurement noise and prior information. When confident prior knowledge exists (small Pn0P_n^0), updates are regularized toward the prior; with noninformative priors (large Pn0P_n^0), data-driven updates dominate.

This block-wise Bayesian update framework supports recursive updating and is well suited to sequential or streaming data scenarios (Menzen et al., 2020).

3. Uncertainty Propagation: Unscented Transform in Tensor Train Format

A central challenge in Bayesian tensor estimation is propagating uncertainty from the distribution of factors to the reconstructed tensor, as the mapping is nonlinear. The algorithm applies the unscented transform (UT) within the TT format:

  1. Gather the posterior means and covariances of all factors into mm and block-diagonal PP.
  2. Generate $2M + 1$ sigma points {x(i)}\{x^{(i)}\},

x(0)=m, x(i)=m+M+λ[P]i, x(i+M)=mM+λ[P]i,x^{(0)} = m, \ x^{(i)} = m + \sqrt{M + \lambda} [\sqrt{P}]_i, \ x^{(i + M)} = m - \sqrt{M + \lambda} [\sqrt{P}]_i,

with weights wim,wiPw_i^m, w_i^P as per the UT specification.

  1. For each sigma point, reconstruct a tensor via the TT mapping S(i)=fT(x(i))S^{(i)} = f_T(x^{(i)}).
  2. Compute mean and covariance:

mUT=i=02MwimS(i), PUT=i=02MwiP(S(i)mUT)(S(i)mUT).m_{UT} = \sum_{i = 0}^{2M} w_i^m S^{(i)}, \ P_{UT} = \sum_{i = 0}^{2M} w_i^P (S^{(i)} - m_{UT})(S^{(i)} - m_{UT})^\top.

This formulation allows efficient estimation of the mean and covariance of the entire low-rank approximation, without explicitly materializing the full (often gigantic) covariance matrix.

4. Practical Benefits: Noise, Priors, and Scalability

The Bayesian ALS framework naturally incorporates noise modeling and prior information. This is valuable in applications with noisy data, where uncertainty quantification is critical for interpretability and robustness. The algorithm is capable of recursive updating, enabling application to streaming or time-varying data. Its exploitation of tensor network (TT) representations ensures scalability; the main computational costs are associated with:

  • Matrix inversions (manageable for moderate factor dimensions),
  • Sigma point generation and propagation in TT format.

By propagating posterior covariances through the TT mapping, the approach avoids storage and computation overhead associated with full-tensor covariance matrices.

5. Applications and Implications

The Bayesian ALS algorithm is applicable wherever low-rank tensor decompositions are used and uncertainty quantification is needed. Key domains include:

  • Image and signal processing: providing both low-rank approximations and confidence intervals, demonstrated, for example, by reconstructing noisy image data and reporting corresponding uncertainty measures (Menzen et al., 2020).
  • Online or recursive estimation for time-varying or streaming tensors, owing to the sequential nature of Bayesian updates.
  • Large-scale multiway data analysis in machine learning, neuroscience, and system identification, where interpretability and prior information play a crucial role.

Uncertainty quantification from this method can guide stopping criteria, inform downstream analyses, and enable risk-aware decision making.

In the limit of noninformative priors and absence of noise, the Bayesian ALS update equations revert to those of standard ALS. However, the Bayesian approach distinguishes itself by:

  • Quantifying uncertainty associated with each decomposition component.
  • Allowing modelers to encode domain expertise through prior distributions.
  • Facilitating principled noise handling and recursive updating.
  • Offering estimates of the overall uncertainty in the reconstructed tensor, rather than just point estimates.

These features address several well-known limitations of classical ALS, especially for noisy, under-determined, or high-dimensional tensor approximation problems.

7. Summary

The Bayesian ALS algorithm generalizes alternating least squares tensor decomposition to a fully probabilistic framework. Through sequential block-wise Bayesian updates of factor distributions and uncertainty propagation via the unscented transform in tensor train format, the approach provides point estimates, uncertainty quantification, and accommodation of both noise and prior domain knowledge. Its recursive update structure and compatibility with TT formats render it practical for modern large-scale, noisy, and time-evolving multiway data problems, marking a significant conceptual and practical advance over traditional ALS-based low-rank approximation methods (Menzen et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this topic yet.