A fully data-driven method for estimating density level sets (1411.7687v1)
Abstract: Density level sets can be estimated using plug-in methods, excess mass algorithms or a hybrid of the two previous methodologies. The plug-in algorithms are based on replacing the unknown density by some nonparametric estimator, usually the kernel. Thus, the bandwidth selection is a fundamental problem from an applied perspective. However, if some a priori information about the geometry of the level set is available, then excess mass algorithms could be useful. Hybrid methods such that granulometric smoothing algorithm assume a mild geometric restriction on the level set and it requires a pilot nonparametric estimator of the density. In this work, a new hybrid algorithm is proposed under the assumption that the level set is r-convex. The main problem in practice is that r is an unknown geometric characteristic of the set. A stochastic algorithm is proposed for selecting its optimal value. The resulting data-driven reconstruction of the level set is able to achieve the same convergence rates as the granulometric smoothing method. However, they do no depend on any penalty term because, although the value of the shape index r is a priori unknown, it is estimated in a data-driven way from the sample points. The practical performance of the estimator proposed is illustrated through a real data example.