Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Nonasymptotic one-and two-sample tests in high dimension with unknown covariance structure (2109.01730v2)

Published 1 Sep 2021 in cs.LG, cs.AI, math.ST, stat.ML, and stat.TH

Abstract: Let $\mathbf{X} = (X_i){1\leq i \leq n}$ be an i.i.d. sample of square-integrable variables in $\mathbb{R}d$, \GB{with common expectation $\mu$ and covariance matrix $\Sigma$, both unknown.} We consider the problem of testing if $\mu$ is $\eta$-close to zero, i.e. $|\mu| \leq \eta $ against $|\mu| \geq (\eta + \delta)$; we also tackle the more general two-sample mean closeness (also known as {\em relevant difference}) testing problem. The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distance $\delta$ such that we can control both the Type I and Type II errors at a given level. The main technical tools are concentration inequalities, first for a suitable estimator of $|\mu|2$ used a test statistic, and secondly for estimating the operator and Frobenius norms of $\Sigma$ coming into the quantiles of said test statistic. These properties are obtained for Gaussian and bounded distributions. A particular attention is given to the dependence in the pseudo-dimension $d$ of the distribution, defined as $d_ := |\Sigma|22/|\Sigma|\infty2$. In particular, for $\eta=0$, the minimum separation distance is ${\Theta}( d_*{\frac{1}{4}}\sqrt{|\Sigma|_\infty/n})$, in contrast with the minimax estimation distance for $\mu$, which is ${\Theta}(d_e{\frac{1}{2}}\sqrt{|\Sigma|_\infty/n})$ (where $d_e:=|\Sigma|1/|\Sigma|\infty$). This generalizes a phenomenon spelled out in particular by Baraud (2002).

Citations (2)

Summary

We haven't generated a summary for this paper yet.