Robust and efficient estimation of high dimensional scatter and location (1504.03389v3)
Abstract: We deal with the equivariant estimation of scatter and location for p-dimensional data, giving emphasis to scatter. It it important that the estimators possess both a high efficiency for normal data and a high resistance to outliers, that is, a low bias under contamination. The most frequently employed estimators are not quite satisfactory in this respect. The Minimum Volume Ellipsoid (MVE) and Minimum Covariance Determinant (MCD) estimators are known to have a very low efficiency. S-Estimators (Davies 1987) with a monotonic weight function like the bisquare behave satisfactorily for "small" p, say p not larger than 10. Rocke (1996) showed that their efficiency tends to one with increasing p. Unfortunately, this advantage is paid with a serious loss of robustness for large p. We consider three families of estimators with controllable efficiencies: non-monotonic S-estimators (Rocke 1996), MM-estimators (Tatsuoka and Tyler 2000) and tau-estimators (Lopuhaa 1991), whose performance for large p has not been explored to date. Two types of starting estimators are employed: the MVE computed through subsampling, and a semi-deterministic procedure proposed by Pe~na and Prieto (2007) for outlier detection. A simulation study shows that the Rocke and MM estimators starting from the Pe~na-Prieto estimator and with an adequate tuning, can simultaneously attain high efficiency and high robustness.