Einsum-Based Multiplicative Update
- The paper introduces a universal framework that fits nonnegative tensor factorizations via custom einsum-based multiplicative updates using majorization–minimization theory.
- It details an algorithmic procedure that efficiently computes per-factor updates and guarantees convergence by strictly decreasing the loss in each iteration.
- The approach demonstrates practical benefits such as GPU acceleration, handling missing data, and reduced computational cost on large-scale multiway datasets.
An einsum-based multiplicative update is a general framework for fitting nonnegative tensor factorizations by casting the factorization as a sequence of tensor contractions parameterized by Einstein summation (einsum) notation. This approach, as implemented in NNEinFact, enables the application of multiplicative update methods to essentially any nonnegative tensor factorization model that can be expressed as a tensor contraction, with a broad variety of loss functions including the -divergence. The algorithm utilizes Python-style einsum strings to specify custom factorization models, combines a universal update formula rooted in majorization–minimization theory, and offers practical routines for scaling, handling missing data, and GPU acceleration (Hood et al., 2 Feb 2026).
1. General Framework: Tensor Factorization as Einsum
Nonnegative tensor factorization aims to approximate a nonnegative -order tensor with a parameterized factorization constructed via factor tensors. The approximation is given by an -way tensor contraction:
where each factor is a tensor whose modes correspond to subsets of the observed and latent indices. The contraction pattern is specified by an einsum string, for example,
1 |
model_str = "i r1, j r1, a r2, r1 r2 -> i j a" |
2. Universal Multiplicative Update Derivation
The general goal is to minimize a differentiable loss
where each factor . The core update is a multiplicative step for each factor :
where the numerator and denominator are defined via sum contractions over all relevant indices involving element-wise functions and , and is a scalar map dictated by the loss choice. Specifically,
- (other factors)
- (other factors)
Alternatively, in the positive/negative gradient view:
with contractions corresponding to appropriate einsum patterns for the positive and negative components (Hood et al., 2 Feb 2026).
Common loss functions and their corresponding , , mappings are:
| Loss type | |||
|---|---|---|---|
| Frobenius | |||
| KL Divergence | $1$ | ||
| -Divergence | (regime-dependent) |
For each loss, the corresponding update embodies the majorization–minimization step derived under mild convexity and decomposability conditions (Hood et al., 2 Feb 2026).
3. Computational Procedure and Pseudocode
The practical algorithm cycles through the following steps until convergence:
- Recompute Fit: Compute via einsum contraction.
- Per-Factor Update: For each factor :
- Compute the numerator (A): einsum contraction of all factors with at position
- Compute the denominator (B): einsum contraction with
- Update:
- Convergence Check: Early stopping via held-out divergence (5–10%) or relative loss decrease
Python-style pseudocode:
1 2 3 4 5 6 7 |
einstr = [swap(model_str, l) for l in range(L)] while not converged: Yhat = einsum(model_str, Θ[0],…,Θ[L−1]) for l in range(L): A = einsum(einstr[l], Θ[0],…,Θ[l−1], a(Y,Yhat), Θ[l+1],…,Θ[L−1]) B = einsum(einstr[l], Θ[0],…,Θ[l−1], b(Y,Yhat), Θ[l+1],…,Θ[L−1]) Θ[l] *= g_inv(A / B) |
4. Theoretical Guarantees and Convergence
The update scheme follows a majorization–minimization (MM) framework. At each block update, a tight surrogate is constructed, with equality at the current iterate. Minimizing with respect to yields the stated multiplicative update. Theorem 2.3 asserts that, assuming decomposability of the loss and model structure, each update strictly decreases the loss and the sequence converges to a stationary point (local minimum) (Hood et al., 2 Feb 2026). This holds for a wide class of models and losses supported by the einsum formulation.
5. Computational Complexity and Implementation Notes
The per-iteration complexity per factor consists of two einsum computations of cost , and an cost to apply the update. For sparse , computation can be restricted to nonzero entries, reducing cost to . Only factor matrices/tensors and the current are stored in memory. Missing data can be incorporated by maintaining a mask and replacing , in all contractions. GPU-accelerated einsum (via PyTorch or NumPy backends) yields multifold speedups relative to explicit loops (Hood et al., 2 Feb 2026).
6. Empirical Performance and Applications
Empirical evaluation considers large, real-world datasets, such as:
- Uber (5-way, )
- ICEWS (4-way, )
- WITS (4-way, )
Optimization comparisons demonstrate that NNEinFact outperforms Adam (tested over six learning rates) across losses, converging up to 90 faster in wall time and achieving less than half the held-out divergence of gradient-based methods. Model-level comparisons show that custom einsum models fitted by this approach yield 10–37% lower held-out divergence than standard CP, Tucker, tensor train, or low-rank Tucker models. Qualitative analyses (e.g., the Uber dataset) recover interpretable spatiotemporal classes using significantly fewer parameters relative to CP models (Hood et al., 2 Feb 2026).
7. Applicability and Significance
The einsum-based multiplicative update framework provides a universal, efficient, and customizable method for nonnegative tensor factorization across a diversity of losses and model structures. It enables researchers to flexibly define bespoke tensor contractions and losses, eliminates the need to design new update rules for each model, and empirically achieves superior convergence and predictive performance relative to standard approaches. This framework supports missing data, scales to tensors with hundreds of millions of entries, and is amenable to GPU acceleration, facilitating practical deployment across scientific domains with complex multiway data (Hood et al., 2 Feb 2026).