Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SALSA: Sequential Approximate Leverage-Score Algorithm with Application in Analyzing Big Time Series Data (2401.00122v1)

Published 30 Dec 2023 in stat.ML and cs.LG

Abstract: We develop a new efficient sequential approximate leverage score algorithm, SALSA, using methods from randomized numerical linear algebra (RandNLA) for large matrices. We demonstrate that, with high probability, the accuracy of SALSA's approximations is within $(1 + O({\varepsilon}))$ of the true leverage scores. In addition, we show that the theoretical computational complexity and numerical accuracy of SALSA surpass existing approximations. These theoretical results are subsequently utilized to develop an efficient algorithm, named LSARMA, for fitting an appropriate ARMA model to large-scale time series data. Our proposed algorithm is, with high probability, guaranteed to find the maximum likelihood estimates of the parameters for the true underlying ARMA model. Furthermore, it has a worst-case running time that significantly improves those of the state-of-the-art alternatives in big data regimes. Empirical results on large-scale data strongly support these theoretical results and underscore the efficacy of our new approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Demand forecasting in the presence of systematic events: Cases in capturing sales promotions. International Journal of Production Economics, 230:107892, 2020.
  2. A. Alaoui and M. W. Mahoney. Fast randomized kernel ridge regression with statistical guarantees. Advances in neural information processing systems, 28, 2015.
  3. Blendenpik: Supercharging LAPACK’s least-squares solver. SIAM Journal on Scientific Computing, 32(3):1217–1236, 2010.
  4. P. Broersen. Autoregressive model orders for Durbin’s MA and ARMA estimators. Signal Processing, IEEE Transactions on, 48:2454–2457, 2000.
  5. L. Cella. Efficient context-aware sequential recommender system. In Companion Proceedings of the The Web Conference 2018, pages 1391–1394, 2018.
  6. Completing any low-rank matrix, provably. The Journal of Machine Learning Research, 16(1):2999–3034, 2015.
  7. Y. Chen and Y. Yang. Fast statistical leverage score approximation in kernel ridge regression. In International Conference on Artificial Intelligence and Statistics, pages 2935–2943. PMLR, 2021.
  8. K. Clarkson and D. Woodruff. Low-rank approximation and regression in input sparsity time. Journal of the ACM (JACM), 63(6):1–45, 2017.
  9. Uniform sampling for matrix approximation. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pages 181–190, 2015.
  10. Input sparsity time low-rank approximation via ridge leverage score sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1758–1777. SIAM, 2017.
  11. M. Derezinski and M. W. Mahoney. Determinantal point processes in randomized numerical linear algebra. Notices of the AMS, 68(1):34–45, 2021.
  12. Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication. SIAM Journal on Computing, 36(1):132–157, 2006.
  13. Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research, 13(Dec):3475–3506, 2012.
  14. P. Drineas and M. W. Mahoney. RandNLA: Randomized numerical linear algebra. Communications of the ACM, 59:80–90, 2016.
  15. P. Drineas and M. W. Mahoney. Lectures on randomized numerical linear algebra. In M. W. Mahoney, J. C. Duchi, and A. C. Gilbert, editors, The Mathematics of Data, IAS/Park City Mathematics Series, pages 1–48. AMS/IAS/SIAM, 2018.
  16. Sampling algorithms for ℓ2subscriptℓ2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT regression and applications. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1127–1136, 2006.
  17. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30:844–881, 2008.
  18. Faster least squares approximation. Numerische Mathematik, 117(2):219–249, 2011.
  19. J. Durbin. Efficient Estimation Of Parameters In Moving-Average Models. Biometrika, 46(3-4):306–316, 1959.
  20. Modeling the dynamics of the COVID-19 population in Australia: A probabilistic analysis. PLoS ONE, 15(10):e0240153, 2020.
  21. The importance of environmental factors in forecasting Australian power demand. Environmental Modeling & Assessment, 27:1–11, 2022.
  22. Rollage: Efficient rolling average algorithm to estimate ARMA models for big time series data. arXiv:2103.09175, 2023.
  23. Toeplitz least squares problems, fast algorithms and big data. arXiv:2112.12994, 2021.
  24. LSAR: Efficient leverage score sampling algorithm for the analysis of big time series data. Journal of Machine Learning Research, 23:1–36, 2022.
  25. LSLQ: An iterative method for linear least-squares with an error minimization property. SIAM Journal on Matrix Analysis and Applications, 40(1):254–275, 2019.
  26. A tradeoff model for green supply chain planning: A leanness-versus-greenness analysis. Omega, 54:173–190, 2015.
  27. D. Fong and M. Saunders. LSMR: An iterative algorithm for sparse least-squares problems. SIAM Journal on Scientific Computing, 33(5):2950–2971, 2011.
  28. G. Golub and C. Loan. Matrix Computations. Johns Hopkins University Press, 1983.
  29. J. Hamilton. Time Series Analysis. Princeton University Press, Princeton, New Jersey, 1994.
  30. E. Hannan and J. Rissanen. Recursive Estimation of Mixed Autoregressive-Moving Average Order. Biometrika, 69(1):81–94, 1982.
  31. Sparse representations, numerical linear algebra, and optimization. 2014.
  32. G. Li, M.and Miller and R. Peng. Iterative row sampling. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 127–136. IEEE, 2013.
  33. Randomized algorithms for the low-rank approximation of matrices. Proceedings of the National Academy of Sciences, 104(51):20167–20172, 2007.
  34. Y. Liu and S. Zhang. Fast quantum algorithms for least squares regression and statistic leverage scores. Theoretical Computer Science, 657:38–47, 2017.
  35. M. W. Mahoney. Randomized algorithms for matrices and data. Foundations and Trends® in Machine Learning, 2011.
  36. M. W. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. USA, 106:697–702, 2009.
  37. X. Meng and M. W. Mahoney. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the 45th Annual ACM Symposium on Theory of Computing, pages 91–100, 2013.
  38. LSRN: A parallel iterative solver for strongly over-or underdetermined systems. SIAM Journal on Scientific Computing, 36(2):C95–C118, 2014.
  39. Randomized Numerical Linear Algebra – a perspective on the field with an eye to software. Technical Report Preprint: arXiv:2302.11474v2, 2023.
  40. C. Musco and C. Musco. Recursive sampling for the Nystrom method. Advances in neural information processing systems, 30, 2017.
  41. J. Nelson and H. Nguyên. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In 2013 ieee 54th annual symposium on foundations of computer science, pages 117–126. IEEE, 2013.
  42. C. Paige and M. Saunders. LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares. ACM Transactions on Mathematical Software, 8(1):43–71, 1982.
  43. P. Rousseeuw and M. Hubert. Robust statistics for outlier detection. WIREs Data Mining and Knowledge Discovery, 1(1):73–79, 2011.
  44. On fast leverage score sampling and optimal learning. Advances in Neural Information Processing Systems, 31, 2018.
  45. T. Sarlós. Improved approximation algorithms for large matrices via random projections. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 143–152, 2006.
  46. Efficient anomaly detection via matrix sketching. Advances in neural information processing systems, 31, 2018.
  47. R. Shumway and D. Stoffer. Time Series Analysis and Its Applications: With R Examples. Springer Texts in Statistics. Springer International Publishing, 4 edition, 2017.
  48. A. Sobczyk and E. Gallopoulos. Estimating leverage scores via rank revealing methods and randomization. SIAM Journal on Matrix Analysis and Applications, 42(3):1199–1228, 2021.
  49. Moliere: Automatic biomedical hypothesis generation system. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1633–1642, 2017.
  50. D. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 2014.
  51. Implementing randomized matrix algorithms in parallel and distributed environments. In Proceedings of the IEEE, pages 58–92, 2016.
  52. Q. Zuo and H. Xiang. A quantum-inspired algorithm for approximating statistical leverage scores. arXiv preprint arXiv:2111.08915, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.