Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians (1312.1054v3)

Published 4 Dec 2013 in cs.DS, cs.LG, math.PR, math.ST, and stat.TH

Abstract: We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians without any separability assumptions. Given $\tilde{O}(1/\varepsilon2)$ samples from an unknown mixture, our algorithm outputs a mixture that is $\varepsilon$-close in total variation distance, in time $\tilde{O}(1/\varepsilon5)$. Our sample complexity is optimal up to logarithmic factors, and significantly improves upon both Kalai et al., whose algorithm has a prohibitive dependence on $1/\varepsilon$, and Feldman et al., whose algorithm requires bounds on the mixture parameters and depends pseudo-polynomially in these parameters. One of our main contributions is an improved and generalized algorithm for selecting a good candidate distribution from among competing hypotheses. Namely, given a collection of $N$ hypotheses containing at least one candidate that is $\varepsilon$-close to an unknown distribution, our algorithm outputs a candidate which is $O(\varepsilon)$-close to the distribution. The algorithm requires ${O}(\log{N}/\varepsilon2)$ samples from the unknown distribution and ${O}(N \log N/\varepsilon2)$ time, which improves previous such results (such as the Scheff\'e estimator) from a quadratic dependence of the running time on $N$ to quasilinear. Given the wide use of such results for the purpose of hypothesis selection, our improved algorithm implies immediate improvements to any such use.

Citations (90)

Summary

We haven't generated a summary for this paper yet.