Poisson Canonical Polyadic Tensor Model

Updated 11 November 2025

The PCP tensor model is a statistical framework that represents count data as independent Poisson random variables structured via nonnegative CP decompositions.
It utilizes EM and majorization-minimization algorithms, along with hybrid stochastic–deterministic methods, to efficiently estimate factor matrices.
Practical strategies include sparse tensor storage and efficient computations, ensuring scalability and reliable parameter inference for high-dimensional data.

The Poisson Canonical Polyadic (PCP) tensor model is a statistical framework for the multilinear decomposition of count-valued tensors. It posits that the observed entries of a multiway array are independent Poisson random variables whose mean parameters admit a nonnegative Canonical Polyadic (CP) structure. This model has been foundational in extending nonnegative matrix factorization (NMF) and low-rank tensor approximations to sparse count data commonly encountered in genomics, signal processing, text analysis, chemometrics, and other domains. Theoretical and algorithmic advancements underpinning PCP encompass generative modeling, Expectation-Maximization (EM) interpretations, majorization-minimization algorithms, Fisher information analysis, and hybrid stochastic–deterministic optimization.

1. Generative Model and Likelihood Formulation

Let $X \in \mathbb{N}^{N_1 \times N_2 \times \cdots \times N_K}$ denote an observed $K$ -way count tensor. The PCP model assumes

$X_{i_1 \cdots i_K} \sim \operatorname{Poisson}(m_{i_1 \cdots i_K}),$

with Poisson means given as

$m_{i_1 \cdots i_K} = \sum_{r=1}^R \prod_{p=1}^K A^{(p)}_{i_p, r},$

where $A^{(p)} \in \mathbb{R}_+^{N_p \times R}$ , $R$ is the CP rank, and all factor matrices are constrained to be nonnegative. In Kruskal notation,

$M = [[A^{(1)}, \ldots, A^{(K)}]].$

The log-likelihood (up to additive constants independent of the parameters) is

$\ell(M; X) = \sum_{i_1,\dots,i_K} \left[ X_{i_1 \cdots i_K} \log m_{i_1 \cdots i_K} - m_{i_1 \cdots i_K} \right].$

Maximum likelihood estimation (MLE) for the factor matrices thus amounts to minimizing the generalized Kullback-Leibler (KL) divergence between $X$ and $M$

$\min_{A^{(p)} \ge 0} \sum_{i_1 \cdots i_K}\left[m_{i_1\cdots i_K} - X_{i_1\cdots i_K} \log m_{i_1\cdots i_K}\right].$

This formulation replaces the classical least-squares (Gaussian) objective, which is ill-suited for sparse count data where the Poisson assumption correctly models the discrete, nonnegative, and frequently zero-inflated nature of the data (Chi et al., 2011, Llosa-Vite et al., 7 Nov 2025, Myers et al., 2022).

2. Latent Variable Formulation and the EM Connection

A core insight is that the PCP model arises by marginalizing over an unobserved ( $K+1$ )-way latent tensor $Z$ defined as

$Z_{r, i_1, \cdots, i_K} \sim \operatorname{Poisson}\left( \prod_{p=1}^K A^{(p)}_{i_p, r} \right),$

with

$X_{i_1\cdots i_K} = \sum_{r=1}^R Z_{r, i_1, \cdots, i_K}.$

The complete-data log-likelihood,

$\ell_c(\theta; z) = \sum_{r=1}^R \sum_{i_1,\dots,i_K} \left[ Z_{r, i_1 \cdots i_K} \log \lambda^{(r)}_{i_1 \cdots i_K} - \lambda^{(r)}_{i_1 \cdots i_K} - \log (Z_{r, i_1 \cdots i_K}!) \right],$

can be maximized in closed form with respect to each factor matrix if $Z$ were observed (Llosa-Vite et al., 7 Nov 2025). In practice, algorithms proceed in the EM framework: the E-step computes

$\mathbb{E}[Z_{r, i}\;|\; X, \theta^{(t)}] = X_{i} \frac{\lambda^{(r)}_i}{\sum_{s=1}^R \lambda^{(s)}_i},$

and the M-step maximizes $\ell_c$ with respect to each $A^{(p)}$ , yielding normalized updates. This correspondence clarifies that canonical multiplicative updates for Poisson NMF and PCP—including Lee & Seung for $K = 2$ and CP-APR for arbitrary $K$ —are EM or Generalized EM algorithms (Llosa-Vite et al., 7 Nov 2025, Chi et al., 2011).

3. Algorithms: Alternating Poisson Regression and Hybrid Optimization

3.1 Majorization-Minimization (CP-APR)

The CP-APR algorithm (Chi et al., 2011) employs nonlinear block Gauss–Seidel iteration: for each mode $n$ , fix all other factor matrices and solve the (strictly convex) subproblem

$\min_{B \ge 0} f_n(B) = \sum_{i=1}^{I_n}\sum_{j=1}^{J_n}\left[ (B \Pi^{(n)})_{ij} - X_{(n),ij}\log\big( (B\Pi^{(n)})_{ij} \big) \right],$

where $\Pi^{(n)}$ is the Khatri–Rao product of all other factors. The majorization-minimization (MM) update for $B$ is

$B^{(k+1)} = B^{(k)} * \frac{[\Phi^{(k)} \Pi^{(n)}]}{[\mathbf{1} \Pi^{(n)}]},$

with $\Phi^{(k)} = X_{(n)} / (B^{(k)} \Pi^{(n)})$ (elementwise division), entrywise multiplication $*$ , and normalization to enforce simplex constraints. These iterations decrease the KL objective monotonically, with convergence to KKT points guaranteed under full row-rank on the support (Chi et al., 2011).

3.2 Hybrid Stochastic–Deterministic Methods

Hybrid GCP–CPAPR (HybridGC) alternates between stochastic GCP-Adam updates and deterministic CPAPR steps (Myers et al., 2022). The stochastic phase ("heating") uses minibatch gradients and adaptive stepsizes to escape poor local minima, while the deterministic phase ("cooling") accelerates convergence to the MLE. Restarted CPAPR with SVDrop periodically computes singular values of the mode- $n$ unfoldings; a sudden drop signals potential rank-deficient local minima, triggering random restarts. Empirical results indicate that HybridGC increases the probability of converging to the global MLE, with SVDrop further eliminating degenerate solutions (Myers et al., 2022).

3.3 Algorithmic Safeguards

To prevent convergence to non-KKT boundary points due to irrecoverable zero entries, a "scooching" procedure increments small inadmissible zeros into the positive orthant: $B(i, r) \leftarrow B(i, r) + \kappa$ whenever $B(i, r) < \kappa_{\rm tol}$ and $(E - \Phi)_{ir} < 0$ (Chi et al., 2011). This guarantees that updates remain interior, crucial for both theoretical convergence and practical stability.

4. Computational Strategies for Sparse Tensors

Efficiency for large, sparse count arrays is achieved by never forming the full dense model tensor or Khatri–Rao products explicitly. Instead:

Store $X$ in coordinate format with nonzero indices and values.
For each nonzero, compute its associated partial product vector across all modes except the current one.
Accumulate required sums only for observed nonzeros.
Per iteration work is $O(R \times \rm{nnz}(X))$ , and working storage is $O((K+1)\,\rm{nnz}(X))$ (Chi et al., 2011). This approach enables scalable decomposition on very sparse high-dimensional data, as demonstrated in empirical studies on tensors with up to $10^9$ entries but only $10^5$ nonzeros (Myers et al., 2022).

5. Parameter Inference: Fisher Information and Identifiability

Recent developments leverage the latent-variable formulation to compute both observed and expected Fisher information matrices for the PCP model (Llosa-Vite et al., 7 Nov 2025). Explicitly, each block of the Fisher information reflects the parameter covariances across different modes and CP components, with the expected information given as

$[I_{\rm exp}(\theta)]_{(k,\ell)}^{r,s} = D_{k,\ell}^{r,s}(M^{*-1}),$

where $D_{k,\ell}^{r,s}$ involves generalized contracted products over auxiliary dimensions. It is shown that $I_{\rm exp}(\theta)$ has rank at most $\min(R \sum_p N_p - L, \prod_p N_p)$ , where $L$ encodes CP scale/rotation indeterminacies. Identifiability and well-posedness depend on the balance between data size ( $\prod N_p$ ) and effective parameter count ( $R \sum N_p - L$ ); if the model is under-determined, the MLE is non-unique (Llosa-Vite et al., 7 Nov 2025).

For the rank-1 PCP, the log-likelihood and Fisher information simplify substantially, allowing closed-form MLEs: $\hat{a}^{(p)} = \frac{X_{(p)}\,\mathbf{1}}{\sum_{i} X_i},$ where $X_{(p)}$ is the mode- $p$ unfolding. The variance and covariances of the parameter estimates are given by blocks of the inverse Fisher information matrix, facilitating exact parametric inference in the simplest case (Llosa-Vite et al., 7 Nov 2025).

6. Empirical Performance and Practical Applications

Empirical studies have established the efficiency and robustness of CP-APR and hybrid stochastic-deterministic schemes for large-scale, sparse count data (Myers et al., 2022). On synthetic and real-world tensors, HybridGC attains within $10^{-4}$ relative loss error of the (empirical) MLE in 96.7% of trials, outperforming pure CP-APR and stochastic GCP-Adam alone. SVDrop restarts nearly eliminate convergence to degenerate solutions, raising success rates to over 99.95% in tested benchmarks. Performance metrics include the relative loss error, probability of near-MLE convergence, and the Factor Match Score (FMS) for basis recovery.

Practical implications include:

Monotonic objective decrease and convergence guarantees under mild genericity conditions.
Algorithmic scalability with space and time linear in the number of nonzeros.
Model selection guidance from Fisher information, flagging under-determined or non-identifiable settings.

7. Theoretical Significance and Ongoing Research Directions

The PCP model provides a rigorous statistical foundation for multiway low-rank decompositions of count data, generalizing NMF principles to higher dimensions and aligning model assumptions with the intrinsic sampling noise of discrete data. Current work explores:

Non-iterative MLEs via latent-variable closure in special cases.
Precise characterizations of identifiability and inherent ambiguities through Fisher information and its rank.
Algorithmic extensions incorporating regularization, missing data, and streaming updates.
Improved heuristics for rank selection based on statistical information content.

A plausible implication is that the intersection of latent-variable methods, MM/EM algorithms, and Fisher information theory offers a roadmap for provably optimal estimation and uncertainty quantification in high-dimensional count tensor decompositions; this is especially pertinent as data complexity and sparsity increase (Llosa-Vite et al., 7 Nov 2025, Chi et al., 2011, Myers et al., 2022).