Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data

Published 31 Mar 2026 in cs.LG, eess.SP, math.OC, and stat.ML | (2603.29715v1)

Abstract: Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, $X$, by the product of two nonnegative factors, $WH$, where $W$ has $r$ columns and $H$ has $r$ rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when $r=1$, in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of $WH$ associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper establishes the NP-hardness of L1-NMF and introduces a scalable coordinate descent algorithm to efficiently handle sparse, noisy datasets.
The paper proposes weighted L1-NMF (wL1-NMF) to counteract over-sparsification by adjusting the impact of false zeros in data.
The paper validates its approach with extensive experiments on synthetic data, MNIST, matrix completion, and topic modeling, demonstrating improved reconstruction accuracy and interpretability.

Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data

Introduction

The paper "Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data" (2603.29715) systematically analyzes the theoretical and algorithmic properties of nonnegative matrix factorization (NMF) with a component-wise L1 norm error measure (L1-NMF), targeting the unique challenges posed by sparse and outlier-corrupted datasets. The fundamental motivation stems from the inadequacy of classical NMF loss functions, such as the Frobenius (L2) and Kullback-Leibler (KL) divergences, in scenarios involving heavy-tailed noise, sparse outlier structures (e.g., salt-and-pepper noise), or systemic zero-inflation.

This work establishes several key results: the computational hardness of solving L1-NMF, an in-depth study of its natural sparsity-promoting effects, a generalization (wL1-NMF) designed to counteract excessive sparsity in the presence of false zeros, and a highly scalable coordinate descent algorithm (sCD) exploiting data sparsity. These contributions are validated through extensive experiments, including synthetic data, MNIST digit images, matrix completion with false zeros, and topic modeling for text data.

Theoretical Analysis of L1-NMF

The authors prove that L1-NMF is NP-hard even for rank-one approximations, in stark contrast to rank-one L2-NMF, which is efficiently solvable via SVD by leveraging the Eckart-Young-Mirsky and Perron-Frobenius theorems. The hardness persists even on binary input matrices, substantiating that combinatorial complexity arises specifically from the L1 error's nonsmoothness and the interplay with nonnegativity constraints.

Further, the study delineates the sparsity-inducing property of L1-NMF. A probabilistic analysis, leveraging Bernoulli random models on data and factor vectors, demonstrates that the probability of a zero being optimal in the scalar coordinate regression subproblem under L1 loss increases with input sparsity. As a result, the L1-NMF factors mirror the sparsity of the input matrix, which can both enhance interpretability and, under some sampling regimes (e.g., zero-inflated count data or false-zero scenarios), undermine approximation quality by over-sparsification.

Weighted L1-NMF Model (wL1-NMF)

To address the limitations imposed by excessive sparsity—particularly when data contains false zeros— the authors introduce the weighted L1-NMF (wL1-NMF) model. Here, a penalization parameter $\lambda\in[0,1]$ downweights the contribution to the objective from matrix entries corresponding to zeros in the observed data:

$\ell(X,W,H,\lambda) = \sum_{(i,j)\in\kappa^+} |X_{ij} - (WH)_{ij}| + \lambda \sum_{(i,j)\in\kappa^0} |(WH)_{ij}|$

where $\kappa^+ = \{(i,j): X_{ij} > 0\}$ and $\kappa^0 = \{(i,j): X_{ij} = 0\}$ .

Statistically, this formulation corresponds to maximum likelihood estimation under a heteroscedastic Laplace noise model, with a distinct scale parameter for zeros versus positive entries. When $\lambda=1$ , the model reduces to standard L1-NMF; when $\lambda=0$ , it treats zeros as missing data in a matrix completion setting.

Algorithm: Sparse Coordinate Descent (sCD)

The challenge of efficiently solving the L1-NMF or wL1-NMF objective arises from both its nonsmoothness and the size/sparsity of modern datasets. The sCD algorithm introduced here adapts classical coordinate descent to exploit both nonnegativity and data sparsity, achieving per-iteration complexity of $\mathcal{O}(r\cdot \text{nnz}(X) \log \text{nnz}(X))$ , where $\text{nnz}(X)$ is the number of nonzeros in $X$ . This is in contrast to prior coordinate descent algorithms, whose complexity scales with $mn$ , the full ambient dimension. At each step, sCD applies an efficient (constrained) weighted median algorithm to the involved one-dimensional L1 regression subproblem.

Experimental Evaluation

Experiments are conducted on synthetic matrices, noisy MNIST images, synthetic matrix completion tasks with false zeros, and the TDT2 topic modeling dataset.

Synthetic Data and Computational Advantage: Comparisons on varying sparsity levels empirically confirm that sCD matches its theoretical advantage, with acceleration factors proportional to data sparsity relative to classical CD.

Noisy Image Recovery (MNIST): L1-NMF outperforms FroNMF, KL-NMF, and L21-NMF as noise becomes heavier, particularly in terms of recovery error and the ability to reconstruct clean foreground/background boundaries.

(Figure 1)

Figure 1: Examples of the low-rank representations of digits using different NMF models on the MNIST dataset.

Algorithmic Comparisons: Relative error and speed comparisons among sCD, Nesterov-smoothing-based BCD (NS), projected subgradient (SUB), and vanilla CD demonstrate that sCD matches or exceeds reconstruction accuracy, while being several times faster as a function of reduced computational burden—especially at higher matrix sparsity.

Matrix Completion with False Zeros: Varying the penalization parameter $\ell(X,W,H,\lambda) = \sum_{(i,j)\in\kappa^+} |X_{ij} - (WH)_{ij}| + \lambda \sum_{(i,j)\in\kappa^0} |(WH)_{ij}|$ 0 in wL1-NMF, the results demonstrate that $\ell(X,W,H,\lambda) = \sum_{(i,j)\in\kappa^+} |X_{ij} - (WH)_{ij}| + \lambda \sum_{(i,j)\in\kappa^0} |(WH)_{ij}|$ 1 is optimal for purely missing data, but even modest $\ell(X,W,H,\lambda) = \sum_{(i,j)\in\kappa^+} |X_{ij} - (WH)_{ij}| + \lambda \sum_{(i,j)\in\kappa^0} |(WH)_{ij}|$ 2 is necessary to mitigate the negative impact of false zeros. The method is robust to the selection of $\ell(X,W,H,\lambda) = \sum_{(i,j)\in\kappa^+} |X_{ij} - (WH)_{ij}| + \lambda \sum_{(i,j)\in\kappa^0} |(WH)_{ij}|$ 3 in a practical range.

Topic Modeling (TDT2): wL1-NMF identifies sparser and more interpretable topic-word associations than both L1-NMF or the L2 Frobenius model (FroNMF). For very sparse document-term matrices ( $\ell(X,W,H,\lambda) = \sum_{(i,j)\in\kappa^+} |X_{ij} - (WH)_{ij}| + \lambda \sum_{(i,j)\in\kappa^0} |(WH)_{ij}|$ 4 sparsity), the correct setting of $\ell(X,W,H,\lambda) = \sum_{(i,j)\in\kappa^+} |X_{ij} - (WH)_{ij}| + \lambda \sum_{(i,j)\in\kappa^0} |(WH)_{ij}|$ 5 is critical; L1-NMF ( $\ell(X,W,H,\lambda) = \sum_{(i,j)\in\kappa^+} |X_{ij} - (WH)_{ij}| + \lambda \sum_{(i,j)\in\kappa^0} |(WH)_{ij}|$ 6) produces degenerate topics with inadequate word coverage.

Implications and Future Directions

The findings demonstrate that L1-NMF is an intrinsically hard optimization problem but one that has unique merits for sparse, noisy, and outlier-heavy data due to its direct modeling of Laplacian (heavy-tailed) corruption and its regularization toward interpretably sparse components. However, in applications where zeros are ambiguous or may represent missing/false data, the introduction of weighted penalties (wL1-NMF) is both statistically justified and empirically necessary. The sCD framework advances the practicality of L1-NMF for large-scale and high-sparsity regimes ubiquitous in modern data analytics.

Beyond the particular models and domains tested (e.g., imaging and text), the sCD architecture and wL1-NMF objective are extensible to structured missingness, custom nonnegativity constraints, and advanced robust low-rank modeling—paving the way for robust NMF-driven learning in bioinformatics, collaborative filtering, or graph mining.

Conclusion

This work provides a rigorous theoretical and algorithmic foundation for robust NMF in the L1 norm, characterizes the sparsity-driven biases it induces, and introduces both a tunable model and a scalable, sparsity-exploiting solution method. These contributions position L1-NMF as a powerful tool for interpretable, resilient low-rank modeling for sparse and noisy data, notably in domains such as vision and text. The development of wL1-NMF and sCD closes an important scalability and flexibility gap, with broad implications for robust and explainable AI.

Markdown Report Issue