Decentralized Online Regularized Learning Over Random Time-Varying Graphs (2206.03861v5)
Abstract: We study the decentralized online regularized linear regression algorithm over random time-varying graphs. At each time step, every node runs an online estimation algorithm consisting of an innovation term processing its own new measurement, a consensus term taking a weighted sum of estimations of its own and its neighbors with additive and multiplicative communication noises and a regularization term preventing over-fitting. It is not required that the regression matrices and graphs satisfy special statistical assumptions such as mutual independence, spatio-temporal independence or stationarity. We develop the nonnegative supermartingale inequality of the estimation error, and prove that the estimations of all nodes converge to the unknown true parameter vector almost surely if the algorithm gains, graphs and regression matrices jointly satisfy the sample path spatio-temporal persistence of excitation condition. Especially, this condition holds by choosing appropriate algorithm gains if the graphs are uniformly conditionally jointly connected and conditionally balanced, and the regression models of all nodes are uniformly conditionally spatio-temporally jointly observable, under which the algorithm converges in mean square and almost surely. In addition, we prove that the regret upper bound is $O(T{1-\tau}\ln T)$, where $\tau\in (0.5,1)$ is a constant depending on the algorithm gains.
- T. Evgeniou, M. Pontil, and T. Poggio, “Regularization networks and support vector machines,” Advances in Computational Mathematics, vol. 13, no. 1, pp. 1-50, Apr. 2000.
- F. Girosi, “An equivalence between sparse approximation and support vector machines neural computation,” Neural Computation, vol. 10, no. 6, pp. 1455-1480, Aug. 1998.
- T. Poggio and S. Smale, “The Mathematics of learning: dealing with data,” Notices of the American Mathematical Society, vol. 50, no. 5, pp. 537-544, May 2003.
- H. Xue and Z. Ren, “Sketch discriminatively regularized online gradient descent classification,” Applied Intelligence, vol. 50, pp. 1367-1378, Jan. 2020.
- N. Zhou, D. J. Trudnowski, J. W. Pierre, and W. A. Mittelstadt, “Electromechanical mode online estimation using regularized robust RLS methods,” IEEE Trans. Power Systems, vol. 23, no. 4, pp. 1670-1680, Nov. 2008.
- Y. Sun, B. Wohlberg, and U. S. Kamilov, “An online plug-and-play algorithm for regularized image reconstruction,” IEEE Trans. Computational Imaging, vol. 5, no. 3, pp. 395-408, Sep. 2019.
- T. Matsushita, “Algorithm for atomic resolution holography using modified L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized linear regression and steepest descent method,” Physica Status Solidi (b), vol. 255, no. 11, pp. 1800091, Jul. 2018.
- K. H. Ng, S. Tatinati, and A. W. Khong, “Grade prediction from multi-valued click-stream traces via bayesian-regularized deep neural networks,” IEEE Trans. Signal Processing, vol. 69, pp. 1477-1491, Feb. 2021.
- E. D. Vito, L. Rosasco, A. Caponnetto, U. D. Giovannini, F. Odone, and P. Bartlett, “Learning from examples as an inverse problem,” J. Machine Learning Research, vol. 6, no. 5, pp. 883-904, May 2005.
- N. Cesa-bianchi, P. M. Long, and M. K. Warmuth, “Worst-case quadratic loss bounds for prediction using linear functions and gradient descent,” IEEE Trans. Neural Networks, vol. 7, no. 3, pp. 604-619, May 1996.
- J. Kivinen and M. K. Warmuth, “Exponentiated gradient versus gradient descent for linear predictors,” Information and Computation, vol. 132, no. 1, pp. 1-63, Jan. 1997.
- V. Vovk, “Competitive on-line statistics,” International Statistical Review, vol. 69, no. 2, pp. 213-248, Aug. 2001.
- P. Gaillard, S. Gerchinovitz, M. Huard, and G. Stoltz, “Uniform regret bounds over ℝdsuperscriptℝ𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for the sequential linear regression problem with the square loss,” in Proc. 30th Int. Conf. Algorithmic Learning Theory, Chicago, USA, Mar. 2019, pp. 404-432.
- S. Gerchinovitz, “Sparsity regret bounds for individual sequences in online linear regression,” J. Machine Learning Research, vol. 14, no. 1, pp. 729-769, Mar. 2013.
- W. Jamil and A. Bouchachia, “Competitive regularised regression,” Neurocomputing, vol. 390, pp. 374-383, May 2020.
- C. Thrampoulidis, S. Oymak, and B. Hassibi, “Regularized linear regression: a precise analysis of the estimation error,” in Proc. 28th Conf. Learning Theory, Jun. 2015, pp.1683-1709.
- H. Zou, “The adaptive lasso and its oracle properties,” J. Amer. Stat. Assoc., vol. 101, no. 476, pp. 1418-1429, Dec. 2006.
- J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Amer. Stat. Assoc., vol. 96, no. 456, pp. 1348-1360, Dec. 2001.
- S. Shalev-Shwartz, “Online learning and online convex optimization,” Foundations and Trends in Machine Learning, vol. 4, no. 2, pp. 107-194, Mar. 2012.
- E. Hazan, “Introduction to online convex optimization,” Foundations and Trends in Optimization, vol. 2, no. 3-4, pp. 157-325, Aug. 2016.
- K. Scaman, F. Bach, S. Bubeck, L. Massoulie, and Y. T. Lee, “Optimal algorithms for non-smooth distributed optimization in networks,” in Proc. 32nd Conf. Neural Information Processing Systems, Montre´´e\acute{\text{e}}over´ start_ARG e end_ARGal, Canada, Dec. 2018, pp. 2745-2754.
- J. B. Predd, S. R. Kulkarni, and H. V. Poor, “A collaborative training algorithm for distributed learning,” IEEE Trans. Information Theory, vol. 55, no. 4, pp. 1856-1871, Mar. 2009.
- Y. Liu, J. Liu, and T. Basar, “Differentially private gossip gradient descent,” in Proc. 57th IEEE Conf. Decision and Control, Miami, USA, Dec. 2018, pp. 2777-2782.
- F. Yan, S. Sundaram, S. Vishwanathan, and Y. Qi, “Distributed autonomous online leaming: Regrets and intrinsic privacy-preserving properties,” IEEE Trans. Knowledge and Data Engineering, vol. 25, no. 11, pp. 2483-2493, Nov. 2013.
- S. Kar and J M. F. Moura, “Convergence rate analysis of distributed gossip (linear parameter) estimation: fundamental limits and tradeoff,” IEEE J. Sel. Topics Signal Processing, vol. 5, no. 4, pp. 674-690, Aug. 2011.
- S. Kar, J. M. F. Moura, and K. Ramanan, “Distributed parameter estimation in sensor networks: Nonlinear observation models and imperfect communication,” IEEE Trans. Information Theory, vol. 58, no. 6, pp. 3575-3605, Jun. 2012.
- S. Kar and J. M. F. Moura, “Consensus+innovations distributed inference over networks: Cooperation and sensing in networked systems,” IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 99-109, May 2013.
- S. Kar, J M. F Moura, and H. V Poor, “Distributed linear parameter estimation: asymptotically efficient adaptive strategies,” SIAM Joumal on Control and Optmization, vol. 51, no. 3, pp. 2200-2229, May 2013.
- A. K. Sahu, S. Kar, J. M. F. Moura, and H. V. Poor, “Distributed constrained recursive nonlinear least-squares estimation: algorithms and asymptotics,” IEEE Trans. Signal and Information Processing over Networks, vol. 2, no. 4, pp. 426-441, Dec. 2016.
- S. Xie and L. Guo, “Analysis of normalized least mean squares-based consensus adaptive filters under a general information condition,” SIAM J. Control and Optimization, vol. 56, no. 5, pp. 3404-3431, Sep. 2018.
- S. Xie and L. Guo, “Analysis of distributed adaptive filters based on diffusion strategies over sensor networks,” IEEE Trans. Automatic Control, vol. 63, no. 11, pp. 3643-3658, Nov. 2018.
- A. K. Sahu, D. Jakovetic, and S. Kar, “𝒞ℐℛℱε𝒞ℐℛℱ𝜀\mathcal{CIRF\varepsilon}caligraphic_C caligraphic_I caligraphic_R caligraphic_F italic_ε: A distributed random fields estimator,” IEEE Trans. Signal Processing, vol. 66, no. 18, pp. 4980-4995, Sep. 2018.
- Y. Chen, S. Kar, and J. M. F. Moura, “Resilient distributed estimation: Sensor attacks,” IEEE Trans. Automatic Control, vol. 64, no. 9, pp.3772-3779, Sep. 2019.
- D. Yuan, A. Proutiere, and G. Shi, “Distributed online linear regressions,” IEEE Trans. Information Theory, vol. 67, no. 1, pp. 616-639, Oct. 2020.
- X. Zhang, T. Li, and Y. Gu, “Consensus+innovations distributed estimation with random network graphs, observation matrices and noises,” in in Proc. 59th IEEE Conf. Decision and Control, Jeju Island, South Korea, Dec. 2020, pp. 4318-4323.
- J. Wang, T. Li, and X. Zhang, “Decentralized cooperative online estimation with random observation matrices, communication graphs and time delays,” IEEE Trans. Information Theory, vol. 67, no. 6, pp. 4035-4059, Jun. 2021.
- T. Li and J. Wang, “Distributed averaging with random network graphs and noises,” IEEE Trans. Information Theory, vol. 64, no. 11, pp. 7063-7080, Nov. 2018.
- Z. Zhang, Y. Zhang, D. Guo, S. Zhao, and X. L. Zhu, “Communication-efficient federated continual learning for distributed learning system with Non-IID data”, SCIENCE CHINA Information Sciences, vol. 66, no. 2, pp. 122102, Dec. 2022.
- L. Guo, “Estimating time-varying parameters by Kalman filter based algorithm: Stability and convergence,” IEEE Trans. Automatic Control, vol. 35, no. 2, pp. 141-147, Feb. 1990.
- J. F. Zhang, L. Guo, and H. F. Chen, “Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stability of estimation errors of Kalman filter for tracking time-varying parameters”, Int. J. Adaptive Control and Signal Processing, vol. 5, no. 3, pp. 155-174, May 1991.
- A. S. Bedi, A. Koppel, and K. Rajawat, “Asynchronous saddle point algorithm for stochastic optimization in heterogeneous networks,” IEEE Trans. Signal Processing, vol. 67, no. 7, pp. 1742-1757, Apr. 2019.
- R. Dixit, A. S. Bedi, and K. Rajawat, “Online learning over dynamic graphs via distributed proximal gradient algorithm,” IEEE Trans. Automatic Control, vol. 66, no. 11, pp. 5065-5079, Nov. 2021.
- Q. Zhang and J. F. Zhang, “Distributed parameter estimation over unreliable networks with markovian switching topologies,” IEEE Trans. Automatic Control, vol. 57, no. 10, pp. 2545-2560, Oct. 2012.
- J. Zhang, X. He, and D. Zhou, “Distributed filtering over wireless sensor networks with parameter and topology uncertainties,” Int. J. Control, vol. 93, no. 4, pp. 910-921, Apr. 2020.
- S. Djaidja and Q. Wu, “An overview of distributed consensus of multi-agent systems with measurement/communication noises,” in Proc. 34th Chin. Control Conf., Hangzhou, China, Jul. 2015, pp. 7285-7290.
- D. V. Dimarogonas and K. H. Johansson, “Stability analysis for multi-agent systems using the incidence matrix: Quantized communication and formation control,” Automatica, vol. 46, no. 4, pp. 695-700, Apr. 2010.
- J. Wang and N. Elia, “Mitigation of complex behavior over networked systems: Analysis of spatially invariant structures,” Automatica, vol. 49, no. 6, pp. 1626-1638, Jun. 2013.
- F. S. Cattivelli and A. H. Sayed, “Diffusion LMS strategies for distributed estimation,” IEEE Trans. Signal Processing, vol. 58, no. 3, pp. 1035-1048, Mar. 2009.
- M. J. Piggott and V. Solo, “Diffusion LMS with correlated regressors II: Performance,” IEEE Trans. Signal Processing, vol. 65, no. 15, pp. 3934-3947, Aug. 2017.
- Leonard, N. Ehrich, and A. Olshevsky, “Cooperative learning in multiagent systems from intermittent measurements,” SIAM J. Control and Optimization, vol. 53, no. 1, pp. 1-29, Jan. 2015.
- H. Robbins and D. Siegmund, “A convergence theorem for non negative almost supermartingales and some applications,” in Selected Papers, T. L. Lai, and D. Siegmund, Eds., New York, USA: Springer-Verlag, 1985.