Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning $k$-Modal Distributions via Testing (1107.2700v3)

Published 13 Jul 2011 in cs.DS, cs.LG, math.ST, and stat.TH

Abstract: A $k$-modal probability distribution over the discrete domain ${1,...,n}$ is one whose histogram has at most $k$ "peaks" and "valleys." Such distributions are natural generalizations of monotone ($k=0$) and unimodal ($k=1$) probability distributions, which have been intensively studied in probability theory and statistics. In this paper we consider the problem of \emph{learning} (i.e., performing density estimation of) an unknown $k$-modal distribution with respect to the $L_1$ distance. The learning algorithm is given access to independent samples drawn from an unknown $k$-modal distribution $p$, and it must output a hypothesis distribution $\widehat{p}$ such that with high probability the total variation distance between $p$ and $\widehat{p}$ is at most $\epsilon.$ Our main goal is to obtain \emph{computationally efficient} algorithms for this problem that use (close to) an information-theoretically optimal number of samples. We give an efficient algorithm for this problem that runs in time $\mathrm{poly}(k,\log(n),1/\epsilon)$. For $k \leq \tilde{O}(\log n)$, the number of samples used by our algorithm is very close (within an $\tilde{O}(\log(1/\epsilon))$ factor) to being information-theoretically optimal. Prior to this work computationally efficient algorithms were known only for the cases $k=0,1$ \cite{Birge:87b,Birge:97}. A novel feature of our approach is that our learning algorithm crucially uses a new algorithm for \emph{property testing of probability distributions} as a key subroutine. The learning algorithm uses the property tester to efficiently decompose the $k$-modal distribution into $k$ (near-)monotone distributions, which are easier to learn.

Citations (2)

Summary

We haven't generated a summary for this paper yet.