Fast kernel half-space depth for data with non-convex supports (2312.14136v1)
Abstract: Data depth is a statistical function that generalizes order and quantiles to the multivariate setting and beyond, with applications spanning over descriptive and visual statistics, anomaly detection, testing, etc. The celebrated halfspace depth exploits data geometry via an optimization program to deliver properties of invariances, robustness, and non-parametricity. Nevertheless, it implicitly assumes convex data supports and requires exponential computational cost. To tackle distribution's multimodality, we extend the halfspace depth in a Reproducing Kernel Hilbert Space (RKHS). We show that the obtained depth is intuitive and establish its consistency with provable concentration bounds that allow for homogeneity testing. The proposed depth can be computed using manifold gradient making faster than halfspace depth by several orders of magnitude. The performance of our depth is demonstrated through numerical simulations as well as applications such as anomaly detection on real data and homogeneity testing.
- Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American mathematical society, 68(3):337–404.
- Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media.
- Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93–104.
- Cascos, I. (2009). Data depth: multivariate statistics and geometry. In Kendall, W. S. and Molchanov, I., editors, New Perspectives in Stochastic Geometry. Oxford University Press, Oxford.
- Outlier detection with the kernelized spatial depth function. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):288–305.
- Monge–kantorovich depth, quantiles, ranks and signs. Annals of statistics, 45(1).
- Center-outward distribution functions, quantiles, ranks, and signs in ℝdsuperscriptℝ𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. arXiv preprint arXiv:1806.01238.
- Some intriguing properties of tukey’s half-space depth. Bernoulli, 17(4):1420–1434.
- Multi-scale classification using localized spatial depth. The Journal of Machine Learning Research, 17(1):7657–7686.
- Dyckerhoff, R. (2004). Data depths satisfying the projection property. Allgemeines Statistisches Archiv, 88:163–190.
- Hand, D. J. (2009). Measuring classifier performance: a coherent alternative to the area under the roc curve. Machine learning, 77(1):103–123.
- A better beta for the h measure of classification performance. Pattern Recognition Letters, 40:41–46.
- Kernel methods in machine learning. Annals of Statistics, 36(3).
- Fast nonparametric classification based on data depth. Statistical Papers, 55:49–69.
- Multivariate analysis by data depth: descriptive statistics, graphics and inference, (with discussion and a rejoinder by liu and singh). The Annals of Statistics, 27(3):783–858.
- A quality index based on data depth and multivariate rank tests. Journal of the American Statistical Association, 88(421):252–260.
- Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences of India, 12:49–55.
- McDiarmid, C. et al. (1989). On the method of bounded differences. Surveys in combinatorics, 141(1):148–188.
- Mosler, K. (2013). Depth statistics. Robustness and complex data structures: Festschrift in Honour of Ursula Gather, pages 17–34.
- Choosing among notions of multivariate depth statistics. Statistical Science, 37(3):348–368.
- General notions of depth for functional data. arXiv preprint arXiv:1208.1981v3.
- A simplex method for function minimization. The computer journal, 7(4):308–313.
- Rayana, S. (2016). ODDS library.
- Anomaly detection by robust statistics. WIREs Data Mining and Knowledge Discovery, 8(2):e1236.
- Estimating the support of a high-dimensional distribution. Neural computation, 13(7):1443–1471.
- Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.
- Two-sample tests based on data depth. Entropy, 25(2):238.
- Functional anomaly detection: a benchmark study. International Journal of Data Science and Analytics, 16(1):101–117.
- Support vector machines. Springer Science & Business Media.
- Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, Vancouver, 1975, volume 2, pages 523–531.
- Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Science & Business Media.
- The multivariate l 1-median and associated data depth. Proceedings of the National Academy of Sciences, 97(4):1423–1426.
- Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press.
- Consistency and convergence rates of one-class svms and related algorithms. Journal of Machine Learning Research, 7(5).
- On the limiting distributions of multivariate depth-based rank sum statistics and related tests. Annals of statistics, 34(6).
- General notions of statistical depth function. The Annals of Statistics, 28:461–482.