Papers
Topics
Authors
Recent
Search
2000 character limit reached

Einsum-Based Multiplicative Update

Updated 4 February 2026
  • The paper introduces a universal framework that fits nonnegative tensor factorizations via custom einsum-based multiplicative updates using majorization–minimization theory.
  • It details an algorithmic procedure that efficiently computes per-factor updates and guarantees convergence by strictly decreasing the loss in each iteration.
  • The approach demonstrates practical benefits such as GPU acceleration, handling missing data, and reduced computational cost on large-scale multiway datasets.

An einsum-based multiplicative update is a general framework for fitting nonnegative tensor factorizations by casting the factorization as a sequence of tensor contractions parameterized by Einstein summation (einsum) notation. This approach, as implemented in NNEinFact, enables the application of multiplicative update methods to essentially any nonnegative tensor factorization model that can be expressed as a tensor contraction, with a broad variety of loss functions including the (α,β)(α,β)-divergence. The algorithm utilizes Python-style einsum strings to specify custom factorization models, combines a universal update formula rooted in majorization–minimization theory, and offers practical routines for scaling, handling missing data, and GPU acceleration (Hood et al., 2 Feb 2026).

1. General Framework: Tensor Factorization as Einsum

Nonnegative tensor factorization aims to approximate a nonnegative MM-order tensor YR+I1××IMY \in \mathbb{R}_+^{I_1 \times \cdots \times I_M} with a parameterized factorization Y^\hat Y constructed via LL factor tensors. The approximation is given by an LL-way tensor contraction:

y^i=r1=1R1rK=1RK=1LΘi,r()\hat y_{\mathcal{i}} = \sum_{r_1=1}^{R_1} \cdots \sum_{r_K=1}^{R_K} \prod_{\ell=1}^L \Theta^{(\ell)}_{i_\ell, r_\ell}

where each factor Θ()\Theta^{(\ell)} is a tensor whose modes correspond to subsets of the observed and latent indices. The contraction pattern is specified by an einsum string, for example,

1
model_str = "i r1, j r1, a r2, r1 r2 -> i j a"
invokes the contraction to yield the desired shape and structure of Y^\hat Y. Custom models, including CP, Tucker, tensor train, or arbitrary user-defined forms, can be defined by varying the einsum string (Hood et al., 2 Feb 2026).

2. Universal Multiplicative Update Derivation

The general goal is to minimize a differentiable loss

L(Y,Y^)=indices i(yi,y^i)L(Y,\hat Y) = \sum_{\text{indices }i} \ell(y_i, \hat y_i)

where each factor Θ()0\Theta^{(\ell)} \geq 0. The core update is a multiplicative step for each factor \ell:

Θ()Θ()g1(NumeratorDenominator)\Theta^{(\ell)} \leftarrow \Theta^{(\ell)} \odot g^{-1}\left(\frac{\text{Numerator}}{\text{Denominator}}\right)

where the numerator and denominator are defined via sum contractions over all relevant indices involving element-wise functions a(,)a(\cdot,\cdot) and b(,)b(\cdot,\cdot), and g()g(\cdot) is a scalar map dictated by the loss choice. Specifically,

  • Numerator=sum-indicesa(Y,Y^)×\text{Numerator} = \sum_{\text{sum-indices}} a(Y,\hat Y)\times (other factors)
  • Denominator=sum-indicesb(Y,Y^)×\text{Denominator} = \sum_{\text{sum-indices}} b(Y,\hat Y)\times (other factors)

Alternatively, in the positive/negative gradient view:

Θnew()=Θold()[L][+L]\Theta^{(\ell)}_{\text{new}} = \Theta^{(\ell)}_{\text{old}} \odot \frac{[\nabla^-_\ell L]}{[\nabla^+_\ell L]}

with contractions corresponding to appropriate einsum patterns for the positive and negative components (Hood et al., 2 Feb 2026).

Common loss functions and their corresponding aa, bb, gg mappings are:

Loss type a(y,y^)a(y,\hat y) b(y,y^)b(y,\hat y) g(λ)g(\lambda)
Frobenius yy y^\hat y λ\lambda
KL Divergence y/y^y/\hat y $1$ λ\lambda
(α,β)(α,β)-Divergence yαy^β1y^α \hat y^{β-1} y^α+β1\hat y^{α+β-1} λ1β\lambda^{|1-β|} (regime-dependent)

For each loss, the corresponding update embodies the majorization–minimization step derived under mild convexity and decomposability conditions (Hood et al., 2 Feb 2026).

3. Computational Procedure and Pseudocode

The practical algorithm cycles through the following steps until convergence:

  1. Recompute Fit: Compute Y^\hat Y via einsum contraction.
  2. Per-Factor Update: For each factor \ell:
    • Compute the numerator (A): einsum contraction of all factors with a(Y,Y^)a(Y,\hat Y) at position \ell
    • Compute the denominator (B): einsum contraction with b(Y,Y^)b(Y,\hat Y)
    • Update: Θ()Θ()g1(A/B)\Theta^{(\ell)} \leftarrow \Theta^{(\ell)} \odot g^{-1}(A/B)
  3. Convergence Check: Early stopping via held-out divergence (5–10%) or relative loss decrease <106< 10^{-6}

Python-style pseudocode:

1
2
3
4
5
6
7
einstr = [swap(model_str, l) for l in range(L)]
while not converged:
    Yhat = einsum(model_str, Θ[0],,Θ[L1])
    for l in range(L):
        A = einsum(einstr[l], Θ[0],,Θ[l1], a(Y,Yhat), Θ[l+1],,Θ[L1])
        B = einsum(einstr[l], Θ[0],,Θ[l1], b(Y,Yhat), Θ[l+1],,Θ[L1])
        Θ[l] *= g_inv(A / B)
This routine allows rapid prototyping of new factorization models by simply specifying a new einsum string and loss function (Hood et al., 2 Feb 2026).

4. Theoretical Guarantees and Convergence

The update scheme follows a majorization–minimization (MM) framework. At each block update, a tight surrogate Q(Θ()Θold())LQ(\Theta^{(\ell)}|\Theta^{(\ell)}_{\text{old}}) \geq L is constructed, with equality at the current iterate. Minimizing QQ with respect to Θ()\Theta^{(\ell)} yields the stated multiplicative update. Theorem 2.3 asserts that, assuming decomposability of the loss and model structure, each update strictly decreases the loss LL and the sequence converges to a stationary point (local minimum) (Hood et al., 2 Feb 2026). This holds for a wide class of models and losses supported by the einsum formulation.

5. Computational Complexity and Implementation Notes

The per-iteration complexity per factor \ell consists of two einsum computations of cost O(IR)O(\prod_{\ell' \neq \ell} I_{\ell'} R_{\ell'}), and an O(IR)O(I_\ell R_\ell) cost to apply the update. For sparse YY, computation can be restricted to nonzero entries, reducing cost to O(nnz(Y)R)O(\operatorname{nnz}(Y)\sum R_\ell). Only LL factor matrices/tensors and the current Y^\hat Y are stored in memory. Missing data can be incorporated by maintaining a mask M{0,1}I1IMM \in \{0,1\}^{I_1\cdots I_M} and replacing YMYY\to M\odot Y, Y^MY^\hat Y\to M\odot\hat Y in all contractions. GPU-accelerated einsum (via PyTorch or NumPy backends) yields multifold speedups relative to explicit loops (Hood et al., 2 Feb 2026).

6. Empirical Performance and Applications

Empirical evaluation considers large, real-world datasets, such as:

  • Uber (5-way, 27×7×24×100×10027\times 7\times 24\times 100\times 100)
  • ICEWS (4-way, 2492×20×228249^2\times 20\times 228)
  • WITS (4-way, 1962×96×29196^2\times 96\times 29)

Optimization comparisons demonstrate that NNEinFact outperforms Adam (tested over six learning rates) across (α,β)(α,β) losses, converging up to 90×\times faster in wall time and achieving less than half the held-out divergence of gradient-based methods. Model-level comparisons show that custom einsum models fitted by this approach yield 10–37% lower held-out divergence than standard CP, Tucker, tensor train, or low-rank Tucker models. Qualitative analyses (e.g., the Uber dataset) recover interpretable spatiotemporal classes using significantly fewer parameters relative to CP models (Hood et al., 2 Feb 2026).

7. Applicability and Significance

The einsum-based multiplicative update framework provides a universal, efficient, and customizable method for nonnegative tensor factorization across a diversity of losses and model structures. It enables researchers to flexibly define bespoke tensor contractions and losses, eliminates the need to design new update rules for each model, and empirically achieves superior convergence and predictive performance relative to standard approaches. This framework supports missing data, scales to tensors with hundreds of millions of entries, and is amenable to GPU acceleration, facilitating practical deployment across scientific domains with complex multiway data (Hood et al., 2 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Einsum-Based Multiplicative Update.