Non-Asymptotic Performance of Social Machine Learning Under Limited Data (2306.09397v2)
Abstract: This paper studies the probability of error associated with the social machine learning framework, which involves an independent training phase followed by a cooperative decision-making phase over a graph. This framework addresses the problem of classifying a stream of unlabeled data in a distributed manner. In this work, we examine the classification task with limited observations during the decision-making phase, which requires a non-asymptotic performance analysis. We establish a condition for consistent training and derive an upper bound on the probability of error for classification. The results clarify the dependence on the statistical properties of the data and the combination policy used over the graph. They also establish the exponential decay of the probability of error with respect to the number of unlabeled samples.
- P. Hu, V. Bordignon, M. Kayaalp, and A. H. Sayed, “Performance of social machine learning under limited data,” in Proc. IEEE ICASSP, Rhodes island, Greece, June 2023, pp. 1–5.
- Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, p. 84–90, 2017.
- A. Graves, A.-R. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Proc. IEEE ICASSP, Vancouver, Canada, May 2013, pp. 6645–6649.
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
- D. Siddarth, D. Acemoglu, D. Allen, K. Crawford, J. Evans, M. Jordan, and E. Weyl, “How AI fails us,” Harvard Justice, Health and Democracy Impact Initiative., December 2021.
- M. I. Jordan, “Artificial intelligence—the revolution hasn’t happened yet,” Harvard Data Science Review, vol. 1, no. 1, pp. 1–9, 2019.
- V. Bordignon, S. Vlaski, V. Matta, and A. H. Sayed, “Learning from heterogeneous data based on social interactions over graphs,” IEEE Transactions on Information Theory, vol. 69, no. 5, pp. 3347–3371, 2023.
- V. Bordignon, V. Matta, and A. H. Sayed, “Adaptive social learning,” IEEE Transactions on Information Theory, vol. 67, no. 9, pp. 6053–6081, 2021.
- A. Jadbabaie, P. Molavi, A. Sandroni, and A. Tahbaz-Salehi, “Non-Bayesian social learning,” Games and Economic Behavior, vol. 76, no. 1, pp. 210–225, 2012.
- S. Shahrampour, A. Rakhlin, and A. Jadbabaie, “Distributed detection: Finite-time analysis and impact of network topology,” IEEE Transactions on Automatic Control, vol. 61, no. 11, pp. 3256–3268, 2016.
- A. Nedic, A. Olshevsky, and C. A. Uribe, “Fast convergence rates for distributed non-Bayesian learning,” IEEE Transactions on Automatic Control, vol. 62, no. 11, pp. 5538–5553, 2017.
- H. Salami, B. Ying, and A. H. Sayed, “Social learning over weakly connected graphs,” IEEE Transactions on Signal and Information Processing over Networks, vol. 3, no. 2, pp. 222–238, 2017.
- A. Lalitha, T. Javidi, and A. D. Sarwate, “Social learning and distributed hypothesis testing,” IEEE Transactions on Information Theory, vol. 64, no. 9, pp. 6161–6179, 2018.
- V. Matta, V. Bordignon, A. Santos, and A. H. Sayed, “Interplay between topology and social learning over weak graphs,” IEEE Open Journal of Signal Processing, vol. 1, pp. 99–119, 2020.
- M. Kayaalp, Y. Inan, E. Telatar, and A. H. Sayed, “On the arithmetic and geometric fusion of beliefs for distributed inference,” arXiv:2204.13741, 2022.
- V. Matta and A. H. Sayed, “Estimation and detection over adaptive networks,” in Cooperative and Graph Signal Processing, P. M. Djurić and C. Richard, Eds. Amsterdam, The Netherlands: Elsevier, 2018, pp. 69–106.
- M. Gutman, “Asymptotically optimal classification for multiple tests with empirically observed statistics,” IEEE Transactions on Information Theory, vol. 35, no. 2, pp. 401–408, 1989.
- L. Devroye, L. Gyorfi, and G. Lugosi, “A note on robust hypothesis testing,” IEEE Transactions on Information Theory, vol. 48, no. 7, pp. 2111–2114, 2002.
- B. G. Kelly, A. B. Wagner, T. Tularak, and P. Viswanath, “Classification of homogeneous data with large alphabets,” IEEE Transactions on Information Theory, vol. 59, no. 2, pp. 782–795, 2012.
- P. Braca, L. M. Millefiori, A. Aubry, S. Marano, A. De Maio, and P. Willett, “Statistical hypothesis testing based on machine learning: Large deviations analysis,” IEEE Open Journal of Signal Processing, vol. 3, pp. 464–495, 2022.
- R. Polikar, “Ensemble based systems in decision making,” IEEE Circuits and Systems Magazine, vol. 6, no. 3, pp. 21–45, 2006.
- M. Mohandes, M. Deriche, and S. O. Aliyu, “Classifiers combination techniques: A comprehensive review,” IEEE Access, vol. 6, pp. 19 626–19 639, 2018.
- Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997.
- N. Littlestone and M. K. Warmuth, “The weighted majority algorithm,” Information and Computation, vol. 108, no. 2, pp. 212–261, 1994.
- J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors),” The Annals of Statistics, vol. 28, no. 2, pp. 337–407, 2000.
- D. Berend and A. Kontorovich, “A finite sample analysis of the naive bayes classifier,” Journal of Machine Learning Research, vol. 16, no. 44, pp. 1519–1545, 2015.
- L. I. Kuncheva, “A theoretical study on six classifier fusion strategies,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 281–286, 2002.
- J. Zhao, X. Xie, X. Xu, and S. Sun, “Multi-view learning overview: Recent progress and new challenges,” Information Fusion, vol. 38, pp. 43–54, 2017.
- P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe, “Convexity, classification, and risk bounds,” Journal of the American Statistical Association, vol. 101, no. 473, pp. 138–156, 2006.
- C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, pp. 273–297, 1995.
- A. H. Sayed, “Adaptation, learning, and optimization over networks,” Foundations and Trends in Machine Learning, vol. 7, no. 4-5, pp. 311–801, 2014.
- S. Boucheron, O. Bousquet, and G. Lugosi, “Theory of classification: A survey of some recent advances,” ESAIM: Probability and Statistics, vol. 9, pp. 323–375, 2005.
- S. Boyd, P. Diaconis, and L. Xiao, “Fastest mixing markov chain on a graph,” SIAM Review, vol. 46, no. 4, pp. 667–689, 2004.
- M. H. DeGroot, “Reaching a consensus,” Journal of the American Statistical Association, vol. 69, no. 345, pp. 118–121, 1974.
- P. Bartlett, Y. Freund, W. S. Lee, and R. E. Schapire, “Boosting the margin: A new explanation for the effectiveness of voting methods,” The Annals of Statistics, vol. 26, no. 5, pp. 1651–1686, 1998.
- W. Gao and Z.-H. Zhou, “On the doubt about margin explanation of boosting,” Artificial Intelligence, vol. 203, pp. 1–18, 2013.
- H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms,” arXiv:1708.07747, 2017.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” Toronto, ON, Canada, Tech. Rep., 2009.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. International Conference on Learning Representations, San Diego, CA, USA, May 2015.
- C. McDiarmid, “On the method of bounded differences,” Surveys in Combinatorics, vol. 141, no. 1, pp. 148–188, 1989.