Limiting distribution of the sample canonical correlation coefficients of high-dimensional random vectors (2103.08014v3)
Abstract: Consider two high-dimensional random vectors $\widetilde{\mathbf x}\in\mathbb Rp$ and $\widetilde{\mathbf y}\in\mathbb Rq$ with finite rank correlations. More precisely, suppose that $\widetilde{\mathbf x}=\mathbf x+A\mathbf z$ and $\widetilde{\mathbf y}=\mathbf y+B\mathbf z$, for independent random vectors $\mathbf x\in\mathbb Rp$, $\mathbf y\in\mathbb Rq$ and $\mathbf z\in\mathbb Rr$ with iid entries of mean 0 and variance 1, and two deterministic matrices $A\in\mathbb R{p\times r}$ and $B\in\mathbb R{q\times r}$ . With $n$ iid observations of $(\widetilde{\mathbf x},\widetilde{\mathbf y})$, we study the sample canonical correlations between them. In this paper, we focus on the high-dimensional setting with a rank-$r$ correlation. Let $t_1\ge\cdots\ge t_r$ be the squares of the population canonical correlation coefficients (CCC) between $\widetilde{\mathbf x}$ and $\widetilde{\mathbf y}$, and $\widetilde\lambda_1\ge\cdots\ge\widetilde\lambda_r$ be the squares of the largest $r$ sample CCC. Under certain moment assumptions on the entries of $\mathbf x$, $\mathbf y$ and $\mathbf z$, we show that there exists a threshold $t_c\in(0, 1)$ such that if $t_i>t_c$, then $\sqrt{n}(\widetilde\lambda_i-\theta_i)$ converges in law to a centered normal distribution, where $\theta_i>\lambda_+$ is a fixed outlier location determined by $t_i$. Our results extend the ones in [4] for Gaussian vectors. Moreover, we find that the variance of the limiting distribution of $\sqrt{n}(\widetilde\lambda_i-\theta_i)$ also depends on the fourth cumulants of the entries of $\mathbf x$, $\mathbf y$ and $\mathbf z$, a phenomenon that cannot be observed in the Gaussian case.