Rho-estimators revisited: General theory and applications (1605.05051v5)
Abstract: Following Baraud, Birg\'e and Sart (2017), we pursue our attempt to design a robust universal estimator of the joint ditribution of $n$ independent (but not necessarily i.i.d.) observations for an Hellinger-type loss. Given such observations with an unknown joint distribution $\mathbf{P}$ and a dominated model $\mathscr{Q}$ for $\mathbf{P}$, we build an estimator $\widehat{\mathbf{P}}$ based on $\mathscr{Q}$ and measure its risk by an Hellinger-type distance. When $\mathbf{P}$ does belong to the model, this risk is bounded by some quantity which relies on the local complexity of the model in a vicinity of $\mathbf{P}$. In most situations this bound corresponds to the minimax risk over the model (up to a possible logarithmic factor). When $\mathbf{P}$ does not belong to the model, its risk involves an additional bias term proportional to the distance between $\mathbf{P}$ and $\mathscr{Q}$, whatever the true distribution $\mathbf{P}$. From this point of view, this new version of $\rho$-estimators improves upon the previous one described in Baraud, Birg\'e and Sart (2017) which required that $\mathbf{P}$ be absolutely continuous with respect to some known reference measure. Further additional improvements have been brought as compared to the former construction. In particular, it provides a very general treatment of the regression framework with random design as well as a computationally tractable procedure for aggregating estimators. We also give some conditions for the Maximum Likelihood Estimator to be a $\rho$-estimator. Finally, we consider the situation where the Statistician has at disposal many different models and we build a penalized version of the $\rho$-estimator for model selection and adaptation purposes. In the regression setting, this penalized estimator not only allows to estimate the regression function but also the distribution of the errors.