Detection of Correlated Random Vectors (2401.13429v3)
Abstract: In this paper, we investigate the problem of deciding whether two standard normal random vectors $\mathsf{X}\in\mathbb{R}{n}$ and $\mathsf{Y}\in\mathbb{R}{n}$ are correlated or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these vectors are statistically independent, while under the alternative, $\mathsf{X}$ and a randomly and uniformly permuted version of $\mathsf{Y}$, are correlated with correlation $\rho$. We analyze the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of $n$ and $\rho$. To derive our information-theoretic lower bounds, we develop a novel technique for evaluating the second moment of the likelihood ratio using an orthogonal polynomials expansion, which among other things, reveals a surprising connection to integer partition functions. We also study a multi-dimensional generalization of the above setting, where rather than two vectors we observe two databases/matrices, and furthermore allow for partial correlations between these two.
- Francis Bach. Polynomial magic III : Hermite polynomials. https://francisbach.com/hermite-polynomials/.
- Database matching under column deletions. 2021 IEEE International Symposium on Information Theory (ISIT), pages 2720–2725, 2021.
- Database matching under column repetitions. ArXiv, abs/2202.01730, 2022.
- Fundamental limits of database alignment. In 2018 IEEE International Symposium on Information Theory (ISIT), page 651–655. IEEE Press, 2018.
- Database alignment with Gaussian features. In Kamalika Chaudhuri and Masashi Sugiyama, editors, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 3225–3233. PMLR, 16–18 Apr 2019.
- Achievability of nearly-exact alignment for correlated Gaussian databases. In 2020 IEEE International Symposium on Information Theory (ISIT), pages 1230–1235, 2020.
- The total variation distance between high-dimensional gaussians with the same mean. arXiv preprint arXiv:1810.08693, 2023.
- Efficient random graph matching via degree profiles. Probability Theory and Related Fields, 179:29–115, 2018.
- The planted matching problem: Sharp threshold and infinite-order phase transition. ArXiv, abs/2103.09383, 2021.
- Phase transitions in the detection of correlated databases. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 9246–9266. PMLR, 23–29 Jul 2023.
- Analytic combinatorics. cambridge University press, 2009.
- Luca Ganassali. Sharp threshold for alignment of graph databases with Gaussian weights. In MSML, 2020.
- Detecting correlated Gaussian databases. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 2064–2069, 2022.
- Detecting correlated Gaussian databases. arXiv preprint arXiv:2206.12011, 2022.
- The planted matching problem: Phase transitions and exact results. The Annals of Applied Probability, 31(6):2663 – 2720, 2021.
- Formulas and Theorems for the Special Functions of Mathematical Physics. Springer, 3th ed. edition, 1966.
- Testing network correlation efficiently via counting trees. to appear in Annals of Statistics, 2023.
- On the privacy of anonymized networks. In Knowledge Discovery and Data Mining, 2011.
- Testing dependency of unlabeled databases. ArXiv, abs/2311.05874, 2023.
- A concentration of measure approach to database de-anonymization. In 2019 IEEE International Symposium on Information Theory (ISIT), page 2748–2752. IEEE Press, 2019.
- Ran Tamir. On correlation detection and alignment recovery of gaussian databases. ArXiv, 2023.
- Alexandre B Tsybakov. Introduction to nonparametric estimation, 2009. URL https://doi. org/10.1007/b13794. Revised and extended from the, 9(10), 2004.
- Wilfredo Urbina-Romero. Preliminary results: The Gaussian measure and hermite polynomials. In Gaussian Harmonic Analysis, pages 1–30. Springer, 2019.
- Testing correlation of unlabeled random graphs. arXiv preprint arXiv:2008.10097, 2020.
- Settling the sharp reconstruction thresholds of random graph matching. IEEE Transactions on Information Theory, 68(8):5391–5417, 2022.