Papers
Topics
Authors
Recent
Search
2000 character limit reached

Nonnegative Matrix Factorization (NMF)

Updated 5 April 2026
  • Nonnegative Matrix Factorization is a technique that decomposes nonnegative data matrices into lower-rank, additive factors, revealing semantically meaningful components.
  • It utilizes loss functions like the Frobenius norm and KL divergence, with PCC-based predictability for robust rank estimation and noise resistance.
  • NMF supports applications in clustering, denoising, and dimensionality reduction by producing stable, sparse, and interpretable parts-based representations.

Nonnegative Matrix Factorization (NMF) refers to a class of algorithms that, given a data matrix with nonnegative entries, seek to approximate it as the product of two or more lower-rank nonnegative matrices. NMF is fundamentally nonconvex and NP-hard, yet it is widely employed in unsupervised learning, parts-based data representation, clustering, denoising, and dimensionality reduction. NMF’s interpretability arises from its nonnegativity constraints, which facilitate the decomposition of data into additive, often sparse and semantically meaningful components. Recent research has linked NMF to foundational concepts in probability and causality, such as the principle of the common cause, leading to advances in rank selection, stability analysis, clustering, and denoising (Khalafyan et al., 3 Sep 2025).

1. Formal Framework and Loss Functions

Given a nonnegative matrix PR+N×MP \in \mathbb{R}_{+}^{N \times M}, standard NMF seeks matrices BR+N×RB \in \mathbb{R}_{+}^{N \times R} and WR+R×MW \in \mathbb{R}_{+}^{R \times M}, with small inner dimension Rmin(N,M)R \ll \min(N, M), such that

PP^=BWP \approx \hat{P} = B W

or, elementwise,

PπiP^πi=b=1RBπbWbiP_{\pi i} \approx \hat{P}_{\pi i} = \sum_{b=1}^R B_{\pi b} W_{b i}

Regular objectives for fitting this decomposition include:

  • Frobenius norm minimization:

{B,W}=argminB,W0PBWF2=π,i[Pπi(BW)πi]2\{B, W\} = \arg\min_{B, W \geq 0} \| P - BW \|_F^2 = \sum_{\pi, i} [P_{\pi i} - (BW)_{\pi i}]^2

  • Kullback–Leibler (KL) divergence:

KL(PP^)=π,i[PπilnPπiP^πiPπi+P^πi]\mathrm{KL}(P \Vert \hat{P}) = \sum_{\pi, i} \left[ P_{\pi i} \ln \frac{P_{\pi i}}{\hat{P}_{\pi i}} - P_{\pi i} + \hat{P}_{\pi i} \right]

The KL objective, at its local minima, guarantees marginal conservation constraints: πP^πi=πPπi;iP^πi=iPπi\sum_\pi \hat{P}_{\pi i} = \sum_\pi P_{\pi i};\quad \sum_i \hat{P}_{\pi i} = \sum_i P_{\pi i}. No additional regularization terms were used in (Khalafyan et al., 3 Sep 2025); alternative formulations may include explicit sparsity or smoothness penalties.

2. Probabilistic Formulation: The Principle of the Common Cause

Interpreting PπiP_{\pi i} (with normalization) as a joint probability BR+N×RB \in \mathbb{R}_{+}^{N \times R}0, the NMF decomposition takes the form

BR+N×RB \in \mathbb{R}_{+}^{N \times R}1

This expresses the “independent mixture model,” wherein BR+N×RB \in \mathbb{R}_{+}^{N \times R}2 indexes a latent “common cause” that statistically “screens off” dependency between BR+N×RB \in \mathbb{R}_{+}^{N \times R}3 and BR+N×RB \in \mathbb{R}_{+}^{N \times R}4: BR+N×RB \in \mathbb{R}_{+}^{N \times R}5 Reichenbach’s principle of the common cause precisely corresponds to the existence of an exact NMF at the nonnegative rank BR+N×RB \in \mathbb{R}_{+}^{N \times R}6.

3. Predictability and Effective Rank Estimation

Standard model selection procedures (e.g., BIC, RRSSQ) are typically used for determining the effective inner rank of NMF, but are susceptible to noise and lack clear optima in practice. By contrast, the PCC-inspired “predictability inequalities” define

BR+N×RB \in \mathbb{R}_{+}^{N \times R}7

for each pixel–image pair (BR+N×RB \in \mathbb{R}_{+}^{N \times R}8). The minimal BR+N×RB \in \mathbb{R}_{+}^{N \times R}9 for which a prescribed small fraction of these inequalities are violated is adopted as an estimate for the effective NMF rank. Empirically, WR+R×MW \in \mathbb{R}_{+}^{R \times M}0 yields a sharp transition, robust to weak noise. Example results for grayscale image matrices: | Dataset | WR+R×MW \in \mathbb{R}_{+}^{R \times M}1 | |------------|-------------------| | Swimmer | 14 | | Olivetti | 26 | | UTKFace | 30 |

In contrast, information-theoretic criteria fail to converge to stable ranks under weak noise (Khalafyan et al., 3 Sep 2025).

4. Stability, Nonidentifiability, and "Sweet-Spot" of Solutions

NMF presents a fundamental nonidentifiability: multiple distinct decompositions may yield similar or identical loss. A solution is practically useful only if its components (basis images WR+R×MW \in \mathbb{R}_{+}^{R \times M}2) are reproducible under noise and random initializations.

Stability is quantified by:

  • Splitting the dataset, applying NMF with the same target rank and random seed, and matching basis images using the Hungarian algorithm on cosine distance:

WR+R×MW \in \mathbb{R}_{+}^{R \times M}3

  • Observing that for WR+R×MW \in \mathbb{R}_{+}^{R \times M}4, the set of basis images clusters tightly (average matched distance WR+R×MW \in \mathbb{R}_{+}^{R \times M}5), even under 25% pixel noise. For WR+R×MW \in \mathbb{R}_{+}^{R \times M}6 or WR+R×MW \in \mathbb{R}_{+}^{R \times M}7, reproducibility degrades significantly.

This “sweet-spot” near WR+R×MW \in \mathbb{R}_{+}^{R \times M}8 delineates the regime of locally unique, interpretable, and stable solutions (Khalafyan et al., 3 Sep 2025).

5. Clustering Interpretation via Approximate Mixture Model

In the “soft” PCC implemented by empirical NMF, the relative approximation error

WR+R×MW \in \mathbb{R}_{+}^{R \times M}9

is strongly anticorrelated with both the true marginal probability and correlatedness in the data: Rmin(N,M)R \ll \min(N, M)0, Rmin(N,M)R \ll \min(N, M)1. Thus, NMF more faithfully explains large and positively correlated events.

This property motivates a clustering strategy: for each basis Rmin(N,M)R \ll \min(N, M)2, form a cluster

Rmin(N,M)R \ll \min(N, M)3

or select the top-Rmin(N,M)R \ll \min(N, M)4 images by Rmin(N,M)R \ll \min(N, M)5. On face datasets, these clusters select visually and semantically coherent sets (e.g., images belonging to the same individual).

6. Denoising via NMF and Quantitative Performance

To denoise, one fits NMF to noisy data Rmin(N,M)R \ll \min(N, M)6 and reconstructs a low-rank approximation. An image Rmin(N,M)R \ll \min(N, M)7 is considered denoised if its cosine distance to the clean version is reduced: Rmin(N,M)R \ll \min(N, M)8 With flip noise of 25% (Swimmer dataset), almost all images are improved across a wide range Rmin(N,M)R \ll \min(N, M)9 (e.g., PP^=BWP \approx \hat{P} = B W0). NMF outperforms truncated PCA (60–70% denoising success vs. lower for PCA). On binarized data, PCA may have a slight edge, but both methods are much better than random performance.

7. Experimental Synthesis and Principal Findings

All results are supported by quantitative benchmarking on three canonical datasets (Swimmer, Olivetti faces, UTKFace) and a suite of metrics:

  • Fraction of satisfied PCC inequalities (PP^=BWP \approx \hat{P} = B W1 criterion)
  • Mean internal cosine distance of basis images
  • BIC1–3 and RRSSQ loss curves
  • Clustering consistency
  • Denoising percentage and reconstruction accuracy

Principal conclusions:

  • PCC-based predictability provides a robust, noise-resistant estimator of effective NMF rank.
  • NMF bases at PP^=BWP \approx \hat{P} = B W2 are highly consistent across noise/seed variation—resolving practical nonidentifiability.
  • NMF naturally implements a “soft” mixture model (PCC), prioritizing large and likely correlations.
  • NMF-derived clusters map to meaningful semantic groupings in images.
  • NMF enables robust denoising, often outperforming PCA, over a broad range of ranks.

This mathematically connects classical NMF with foundational probabilistic-causal modeling concepts, advancing the methodology for principled rank selection, part stability, clustering, and denoising (Khalafyan et al., 3 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Nonnegative Matrix Factorization (NMF).