Structured Matrix Learning under Arbitrary Entrywise Dependence and Estimation of Markov Transition Kernel (2401.02520v1)
Abstract: The problem of structured matrix estimation has been studied mostly under strong noise dependence assumptions. This paper considers a general framework of noisy low-rank-plus-sparse matrix recovery, where the noise matrix may come from any joint distribution with arbitrary dependence across entries. We propose an incoherent-constrained least-square estimator and prove its tightness both in the sense of deterministic lower bound and matching minimax risks under various noise distributions. To attain this, we establish a novel result asserting that the difference between two arbitrary low-rank incoherent matrices must spread energy out across its entries, in other words cannot be too sparse, which sheds light on the structure of incoherent low-rank matrices and may be of independent interest. We then showcase the applications of our framework to several important statistical machine learning problems. In the problem of estimating a structured Markov transition kernel, the proposed method achieves the minimax optimality and the result can be extended to estimating the conditional mean operator, a crucial component in reinforcement learning. The applications to multitask regression and structured covariance estimation are also presented. We propose an alternating minimization algorithm to approximately solve the potentially hard optimization problem. Numerical results corroborate the effectiveness of our method which typically converges in a few steps.
- Abbe, E. (2017). Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1), 6446–6531.
- Adamczak, R. (2008). A tail inequality for suprema of unbounded empirical processes with applications to markov chains. Electronic Journal of Probability, 13, 1000–1034.
- Flambe: Structural complexity and representation learning of low rank mdps. Advances in neural information processing systems, 33, 20095–20107.
- Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. The Annals of Statistics, 40(2), 1171–1197.
- Berestycki, N. (2016). Mixing times of markov chains: Techniques and examples. Alea-Latin American Journal of Probability and Mathematical Statistics.
- Parallel and distributed computation: numerical methods.
- Generalized low-rank plus sparse tensor estimation by fast riemannian optimization. Journal of the American Statistical Association, (pp. 1–17).
- Exact matrix completion via convex optimization. Communications of the ACM, 55(6), 111–119.
- Robust principal component analysis? Journal of the ACM (JACM), 58(3), 1–37.
- Matrix completion with noise. Proceedings of the IEEE, 98(6), 925–936.
- Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21(2), 572–596.
- Nonconvex rectangular matrix completion via gradient descent without l2,∞subscript𝑙2l_{2,\infty}italic_l start_POSTSUBSCRIPT 2 , ∞ end_POSTSUBSCRIPT regularization. IEEE Transactions on Information Theory, 66(9), 5806–5841.
- Harnessing structures in big data via guaranteed low-rank matrix estimation: Recent theory and fast algorithms via convex and nonconvex optimization. IEEE Signal Processing Magazine, 35(4), 14–31.
- Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5), 566–806.
- Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization. SIAM journal on optimization, 30(4), 3098–3121.
- Inference and uncertainty quantification for noisy matrix completion. Proceedings of the National Academy of Sciences, 116(46), 22931–22937.
- Bridging convex and nonconvex optimization in robust pca: Noise, outliers and missing data. The Annals of Statistics, 49(5), 2948–2971.
- Convex and nonconvex optimization are both minimax-optimal for noisy blind deconvolution under random designs. Journal of the American Statistical Association, 118(542), 858–868.
- Spectral mle: Top-k rank aggregation from pairwise comparisons. In International Conference on Machine Learning (pp. 371–380).: PMLR.
- Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees. arXiv preprint arXiv:1509.03025.
- Nearly optimal robust matrix completion. In International Conference on Machine Learning (pp. 797–805).: PMLR.
- Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Transactions on Signal Processing, 67(20), 5239–5269.
- Chung, K. L. (1967). Markov chains. Springer-Verlag, New York.
- Iteratively reweighted least squares minimization for sparse recovery. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 63(1), 1–38.
- High dimensional covariance matrix estimation using a factor model. Journal of Econometrics, 147(1), 186–197.
- Farmtest: Factor-adjusted robust multiple testing with approximate false discovery control. Journal of the American Statistical Association, 114(528), 1880–1893.
- Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(1), 247–265.
- High dimensional covariance matrix estimation in approximate factor models. Annals of statistics, 39(6), 3320.
- Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society. Series B, Statistical methodology, 75(4).
- Principal component analysis for big data. arXiv preprint arXiv:1801.01602.
- Robust high dimensional factor models with applications to statistical machine learning. Statistical science: a review journal of the Institute of Mathematical Statistics, 36(2), 303.
- A shrinkage principle for heavy-tailed data: High-dimensional robust low-rank matrix recovery. arXiv preprint arXiv:1603.08315.
- A shrinkage principle for heavy-tailed data: High-dimensional robust low-rank matrix recovery. Annals of statistics, 49(3), 1239.
- On the optimization landscape of tensor decompositions. Advances in Neural Information Processing Systems, 30.
- Gross, D. (2011). Recovering low-rank matrices from few coefficients in any basis. IEEE Transactions on Information Theory, 57(3), 1548–1566.
- Low-rank and sparse structure pursuit via alternating minimization. In Artificial Intelligence and Statistics (pp. 600–609).: PMLR.
- On learning markov chains. Advances in Neural Information Processing Systems, 31.
- Robust matrix decomposition with sparse corruptions. IEEE Transactions on Information Theory, 57(11), 7221–7234.
- Bernstein’s inequality for general markov chains. arXiv preprint arXiv:1805.10721.
- Provably efficient reinforcement learning with linear function approximation. In Conference on Learning Theory (pp. 2137–2143).: PMLR.
- Pca in high dimensions: An orientation. Proceedings of the IEEE, 106(8), 1277–1292.
- On learning distributions from their samples. In Conference on Learning Theory (pp. 1066–1100).: PMLR.
- Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. The Annals of Statistics, 39(5), 2302–2329.
- Markov chains and mixing times, volume 107. American Mathematical Soc.
- Optimality of spectral clustering in the gaussian mixture model. The Annals of Statistics, 49(5), 2506–2530.
- Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion. In International Conference on Machine Learning (pp. 3345–3354).: PMLR.
- McCullagh, P. (2019). Generalized linear models. Routledge.
- A bernstein type inequality and moderate deviations for weakly dependent sequences. Probability Theory and Related Fields, 151, 435–474.
- Model-free representation learning and exploration in low-rank mdps. arXiv preprint arXiv:2102.07035.
- Ndaoud, M. (2018). Sharp optimal recovery in the two-component gaussian mixture model. arXiv preprint arXiv:1812.08078.
- Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. The Journal of Machine Learning Research, 13(1), 1665–1697.
- A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers.
- Non-convex robust pca. Advances in neural information processing systems, 27.
- Norris, J. R. (1998). Markov chains. Number 2. Cambridge university press.
- Iterative Solution of Nonlinear Equations in Several Variables, volume 30. SIAM.
- Paulin, D. (2015). Concentration inequalities for markov chains by marton couplings and spectral methods. Electronic Journal of Probability, 20, 1–32.
- Estimation of high-dimensional low-rank matrices.
- Guaranteed matrix completion via non-convex factorization. IEEE Transactions on Information Theory, 62(11), 6535–6579.
- Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press.
- Robust matrix completion with heavy-tailed noise. arXiv preprint arXiv:2206.04276.
- Minimax learning of ergodic markov chains. In Algorithmic Learning Theory (pp. 904–930).: PMLR.
- Statistical estimation of ergodic markov chain kernel over discrete state space. Bernoulli, 27(1), 532–553.
- Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. Advances in neural information processing systems, 22.
- Fast algorithms for robust pca via gradient descent. Advances in neural information processing systems, 29.
- Yu, B. (1997). Assouad, fano, and le cam. Festschrift for Lucien Le Cam: research papers in probability and statistics, (pp. 423–435).
- Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3), 329–346.
- Spectral state compression of markov processes. IEEE transactions on information theory, 66(5), 3202–3231.
- Convergence analysis for rectangular matrix completion using burer-monteiro factorization and gradient descent. arXiv preprint arXiv:1605.07051.
- Learning markov models via low-rank optimization. Operations Research, 70(4), 2384–2398.