Privacy-Preserving Community Detection for Locally Distributed Multiple Networks (2306.15709v2)
Abstract: Modern multi-layer networks are commonly stored and analyzed in a local and distributed fashion because of the privacy, ownership, and communication costs. The literature on the model-based statistical methods for community detection based on these data is still limited. This paper proposes a new method for consensus community detection and estimation in a multi-layer stochastic block model using locally stored and computed network data with privacy protection. A novel algorithm named privacy-preserving Distributed Spectral Clustering (ppDSC) is developed. To preserve the edges' privacy, we adopt the randomized response (RR) mechanism to perturb the network edges, which satisfies the strong notion of differential privacy. The ppDSC algorithm is performed on the squared RR-perturbed adjacency matrices to prevent possible cancellation of communities among different layers. To remove the bias incurred by RR and the squared network matrices, we develop a two-step bias-adjustment procedure. Then we perform eigen-decomposition on the debiased matrices, aggregation of the local eigenvectors using an orthogonal Procrustes transformation, and k-means clustering. We provide theoretical analysis on the statistical errors of ppDSC in terms of eigen-vector estimation. In addition, the blessings and curses of network heterogeneity are well-explained by our bounds.
- Inference for multiple heterogeneous networks with a common invariant subspace. Journal of Machine Learning Research, 22(142):1–49, 2021.
- An information theoretic approach to post randomization methods under differential privacy. Statistics and Computing, 30(5):1347–1361, 2020.
- Privacy amplification by subsampling: Tight analyses via couplings and divergences. Advances in Neural Information Processing Systems, 31, 2018.
- Stochastic block models for multiplex networks: an application to a multilevel network of researchers. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(1):295–314, 2017.
- On distributed averaging for stochastic k-PCA. Advances in neural information processing systems, 32:11026–11035, 2019.
- Spectral clustering for multiple sparse networks: I. arXiv preprint arXiv:1805.10594, 2018.
- Distribution-invariant differential privacy. Journal of Econometrics, page In Press, 2022.
- The structure and dynamics of multilayer networks. Physics reports, 544(1):1–122, 2014.
- Private graphon estimation for sparse graphs. Advances in Neural Information Processing Systems, 28:1369–1377, 2015.
- Joshua Cape. Orthogonal procrustes and norm-dependent optimality. The Electronic Journal of Linear Algebra, 36:158–168, 2020.
- Edge differentially private estimation in the β𝛽\betaitalic_β-model via jittering and method of moments. arXiv preprint arXiv:2112.10151, 2021.
- Communication-efficient distributed eigenspace estimation. SIAM Journal on Mathematics of Data Science, 3(4):1067–1092, 2021.
- Distributed estimation for principal component analysis: An enlarged eigenspace analysis. Journal of the American Statistical Association, pages 1–12, 2021a.
- Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5):566–806, 2021b.
- Uniform bounds for invariant subspace perturbations. SIAM Journal on Matrix Analysis and Applications, 41(3):1208–1236, 2020.
- The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970.
- Gaussian differential privacy. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84(1):3–37, 2022.
- Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 113(521):182–201, 2018.
- Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265–284. Springer, 2006.
- The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application, 4(1):61–84, 2017.
- Distributed estimation of principal eigenspaces. Annals of statistics, 47(6):3009–3031, 2019.
- Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM Journal on Matrix Analysis and Applications, 34(1):23–39, 2013.
- Fast and communication-efficient distributed pca. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7450–7454. IEEE, 2019.
- Communication-efficient algorithms for distributed stochastic principal component analysis. In International Conference on Machine Learning, pages 1203–1212. PMLR, 2017.
- Producing a unified graph representation from multiple social network views. In Proceedings of the 5th annual ACM web science conference, pages 118–121, 2013.
- Randomized spectral co-clustering for large-scale directed networks. arXiv preprint arXiv:2004.12164, 2020.
- Privacy-preserving distributed SVD via federated power. arXiv preprint arXiv:2103.00704, 2021.
- Consistent estimation of dynamic and multi-layer block models. In International Conference on Machine Learning, pages 1511–1520. PMLR, 2015.
- Consistency of privacy-preserving spectral clustering under the stochastic block model. arXiv preprint arXiv:2105.12615, 2021.
- Spectral clustering via adaptive layer aggregation for multi-layer networks. Journal of Computational and Graphical Statistics, pages 1–15, 2022. doi: 10.1080/10618600.2022.2134874.
- Locally differentially private analysis of graph statistics. In 30th USENIX Security Symposium (USENIX Security 21), pages 983–1000, 2021.
- Differentially private community detection in attributed social networks. In Asian Conference on Machine Learning, pages 16–31. PMLR, 2019.
- Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
- Inference using noisy degrees: Differentially private β𝛽\betaitalic_β-model and synthetic graphs. The Annals of Statistics, 44(1):87–112, 2016.
- Private analysis of graph structure. Proceedings of the VLDB Endowment, 4(11):1146–1157, 2011.
- Sharing social network data: differentially private estimation of exponential family random-graph models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 66(3):481–500, 2017.
- Multiplicative attribute graph model of real-world networks. Internet mathematics, 8(1-2):113–160, 2012.
- Multilayer networks. Journal of complex networks, 2(3):203–271, 2014.
- Bias-adjusted spectral clustering in multi-layer stochastic block models. Journal of the American Statistical Association, pages 1–13, 2022.
- Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237, 2015.
- Consistent community detection in multi-layer network data. Biometrika, 107(1):61–73, 2020.
- Network cross-validation by edge sampling. Biometrika, 107(2):257–276, 2020.
- Communication-efficient distributed svd via local power iterations. In International Conference on Machine Learning, pages 6504–6514. PMLR, 2021.
- Global spectral clustering in dynamic networks. Proceedings of the National Academy of Sciences, 115(5):927–932, 2018.
- Multi-view clustering via joint nonnegative matrix factorization. In Proceedings of the 2013 SIAM international conference on data mining, pages 252–260. SIAM, 2013.
- Determining the number of communities in degree-corrected stochastic block models. Journal of machine learning research, 22(69), 2021.
- Latent space models for multiplex networks with shared structure. Biometrika, 109(3):683–706, 2022.
- Frank D McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 19–30, 2009.
- A differentially private estimator for the stochastic kronecker graph model. In Proceedings of the 2012 Joint EDBT/ICDT Workshops, pages 167–176, 2012.
- Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), pages 263–275. IEEE, 2017.
- Community structure in time-dependent, multiscale, and multiplex networks. science, 328(5980):876–878, 2010.
- Detecting communities under differential privacy. In Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society, pages 83–93, 2016.
- Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75–84, 2007.
- Consistent community detection in multi-relational data through restricted multi-layer stochastic blockmodel. Electronic Journal of Statistics, 10(2):3807–3870, 2016.
- Spectral and matrix factorization methods for consistent community detection in multi-layer networks. The Annals of Statistics, 48(1):230–250, 2020.
- Generating synthetic decentralized social graphs with local differential privacy. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 425–438, 2017.
- Relationship privacy: output perturbation for queries with joins. In Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 107–116, 2009.
- Spectral clustering and the high-dimensional stochastic block model. The Annals of Statistics, 39(4):1878–1915, 2011.
- Towards effective visual analytics on multiplex and multilayer networks. Chaos, Solitons & Fractals, 72:68–76, 2015.
- Peter H Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966.
- Differentially private community detection for stochastic block models. In International Conference on Machine Learning, pages 15858–15894. PMLR, 2022.
- Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
- Randomized spectral clustering in large-scale stochastic block models. Journal of Computational and Graphical Statistics, 31(3):887–906, 2022.