Algorithms for Non-Negative Matrix Factorization on Noisy Data With Negative Values (2311.04855v4)
Abstract: Non-negative matrix factorization (NMF) is a dimensionality reduction technique that has shown promise for analyzing noisy data, especially astronomical data. For these datasets, the observed data may contain negative values due to noise even when the true underlying physical signal is strictly positive. Prior NMF work has not treated negative data in a statistically consistent manner, which becomes problematic for low signal-to-noise data with many negative values. In this paper we present two algorithms, Shift-NMF and Nearly-NMF, that can handle both the noisiness of the input data and also any introduced negativity. Both of these algorithms use the negative data space without clipping, and correctly recover non-negative signals without any introduced positive offset that occurs when clipping negative data. We demonstrate this numerically on both simple and more realistic examples, and prove that both algorithms have monotonically decreasing update rules.
- P. Paatero and U. Tapper, “Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values,” Environmetrics, vol. 5, no. 2, pp. 111–126, 1994.
- D. Lee and H. S. Seung, “Algorithms for Non-negative Matrix Factorization,” in Advances in Neural Information Processing Systems, vol. 13. MIT Press, 2000.
- V. P. Pauca, J. Piper, and R. J. Plemmons, “Nonnegative matrix factorization for spectral data analysis,” Linear Algebra and its Applications, vol. 416, no. 1, pp. 29–47, Jul. 2006.
- Y. Li and A. Ngom, “The non-negative matrix factorization toolbox for biological data mining,” Source Code for Biology and Medicine, vol. 8, no. 1, p. 10, Apr. 2013.
- C.-Y. Lin, L.-W. Kang, T.-Y. Huang, and M.-K. Chang, “A novel non-negative matrix factorization technique for decomposition of Chinese characters with application to secret sharing,” EURASIP Journal on Advances in Signal Processing, vol. 2019, no. 1, p. 35, Aug. 2019.
- M. R. Blanton and S. Roweis, “K -Corrections and Filter Transformations in the Ultraviolet, Optical, and Near-Infrared,” The Astronomical Journal, vol. 133, no. 2, pp. 734–754, Feb. 2007.
- A. Čopar, B. Zupan, and M. Zitnik, “Fast optimization of non-negative matrix tri-factorization,” PLOS ONE, vol. 14, no. 6, p. e0217994, Jun. 2019, publisher: Public Library of Science.
- P. Tsalmantza and D. W. Hogg, “A Data-driven Model for Spectra: Finding Double Redshifts in the Sloan Digital Sky Survey,” The Astrophysical Journal, vol. 753, p. 122, Jul. 2012.
- G. Zhu, “Nonnegative Matrix Factorization (NMF) with Heteroscedastic Uncertainties and Missing data,” arXiv:1612.06037, Dec. 2016.
- B. Ren, L. Pueyo, G. B. Zhu, J. Debes, and G. Duchêne, “Non-negative Matrix Factorization: Robust Extraction of Extended Structures,” The Astrophysical Journal, vol. 852, no. 2, p. 104, Jan. 2018, publisher: The American Astronomical Society.
- S. K. P. M. et al., “NMF-based GPU accelerated coronagraphy pipeline,” in Techniques and Instrumentation for Detection of Exoplanets XI, G. J. Ruane, Ed., vol. 12680, International Society for Optics and Photonics. SPIE, 2023, p. 1268021.
- A. Boulais, O. Berné, G. Faury, and Y. Deville, “Unmixing methods based on nonnegativity and weakly mixed pixels for astronomical hyperspectral datasets,” Astronomy & Astrophysics, vol. 647, p. A105, Mar. 2021.
- G. Wang, A. V. Kossenkov, and M. F. Ochs, “LS-NMF: A modified non-negative matrix factorization algorithm utilizing uncertainty estimates,” BMC Bioinformatics, vol. 7, no. 1, p. 175, Mar. 2006.
- S. M. Plis, V. K. Potluru, V. D. Calhoun, and T. Lane, “Correlated noise: How it breaks NMF, and what to do about it,” in 2009 IEEE International Workshop on Machine Learning for Signal Processing, Sep. 2009, pp. 1–6.
- “sklearn.decomposition.NMF,” https://scikit-learn/stable/modules/generated/sklearn.decomposition.NMF.html (Accessed: 2024-03-27).
- “nneg function - RDocumentation,” https://www.rdocumentation.org/packages/NMF/versions/0.26/topics/nneg (Accessed: 2024-03-27).
- A. S. Bolton et al., “Spectral Classification and Redshift Measurement for the SDSS-III Baryon Oscillation Spectroscopic Survey,” The Astronomical Journal, vol. 144, no. 5, p. 144, Nov. 2012.
- A. Brodzeller et al., “Performance of the Quasar Spectral Templates for the Dark Energy Spectroscopic Instrument,” The Astronomical Journal, vol. 166, no. 2, p. 66, Aug. 2023.
- B. W. Lyke et al., “The Sloan Digital Sky Survey Quasar Catalog: Sixteenth Data Release,” The Astrophysical Journal Supplement Series, vol. 250, no. 1, p. 8, Sep. 2020.
- DESI Collaboration et al., “Validation of the Scientific Program for the Dark Energy Spectroscopic Instrument,” arXiv:2306.06307, Jun. 2023.
- J. Guy et al., “The Spectroscopic Data Processing Pipeline for the Dark Energy Spectroscopic Instrument,” The Astronomical Journal, vol. 165, no. 4, p. 144, Apr. 2023.
- S. Bailey, “Principal Component Analysis with Noisy and/or Missing Data,” Publications of the Astronomical Society of the Pacific, vol. 124, pp. 1015–1023, Sep. 2012.
- L. Delchambre, “Weighted principal component analysis: a weighted covariance eigendecomposition approach,” Monthly Notices of the Royal Astronomical Society, vol. 446, no. 4, pp. 3545–3555, Feb. 2015.
- I. McGreer, J. Moustakas, and J. Schindler, “simqso: Simulated quasar spectra generator,” Astrophysics Source Code Library, record ascl:2106.008, Jun. 2021.
- K. S. Dawson et al., “The SDSS-IV Extended Baryon Oscillation Spectroscopic Survey: Overview and Early Data,” The Astronomical Journal, vol. 151, no. 2, p. 44, Feb. 2016.
- B. Ren et al., “Using Data Imputation for Signal Separation in High-contrast Imaging,” The Astrophysical Journal, vol. 892, no. 2, p. 74, Mar. 2020, publisher: The American Astronomical Society.
- P. Virtanen et al., “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, vol. 17, pp. 261–272, 2020.
- R. J. McCarty, N. Ronghe, M. Woo, and T. M. Alam, “Blind Source Separation for NMR Spectra with Negative Intensity,” arXiv:2002.03009, Feb. 2020.
- “Chapter 16 - Time-Frequency Methodologies in Neurosciences,” in Time-Frequency Signal Analysis and Processing (Second Edition), second edition ed., B. Boashash, Ed. Oxford: Academic Press, 2016, pp. 915–966.
- P. M. Kim and B. Tidor, “Subsystem Identification Through Dimensionality Reduction of Large-Scale Gene Expression Data,” Genome Research, vol. 13, no. 7, pp. 1706–1718, Jul. 2003.
- C. R. Harris et al., “Array programming with NumPy,” Nature, vol. 585, no. 7825, pp. 357–362, Sep. 2020.
- R. Okuta, Y. Unno, D. Nishino, S. Hido, and C. Loomis, “Cupy: A numpy-compatible library for nvidia gpu calculations,” in Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), 2017.
- W. Tang, Z. Shi, and Z. An, “Nonnegative matrix factorization for hyperspectral unmixing using prior knowledge of spectral signatures,” Optical Engineering, vol. 51, no. 8, p. 087001, Aug. 2012.