Simple and Optimal Sublinear Algorithms for Mean Estimation
Abstract: We study the sublinear multivariate mean estimation problem in $d$-dimensional Euclidean space. Specifically, we aim to find the mean $\mu$ of a ground point set $A$, which minimizes the sum of squared Euclidean distances of the points in $A$ to $\mu$. We first show that a multiplicative $(1+\varepsilon)$ approximation to $\mu$ can be found with probability $1-\delta$ using $O(\varepsilon{-1}\log \delta{-1})$ many independent uniform random samples, and provide a matching lower bound. Furthermore, we give two sublinear time algorithms with optimal sample complexity for extracting a suitable approximate mean: 1. A gradient descent approach running in time $O((\varepsilon{-1}+\log\log \delta{-1})\cdot \log \delta{-1} \cdot d)$. It optimizes the geometric median objective while being significantly faster for our specific setting than all other known algorithms for this problem. 2. An order statistics and clustering approach running in time $O\left((\varepsilon{-1}+\log{\gamma}\delta{-1})\cdot \log \delta{-1} \cdot d\right)$ for any constant $\gamma>0$. Throughout our analysis, we also generalize the familiar median-of-means estimator to the multivariate case, showing that the geometric median-of-means estimator achieves an optimal sample complexity for estimating $\mu$, which may be of independent interest.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.