Signature Isolation Forest (2403.04405v3)
Abstract: Functional Isolation Forest (FIF) is a recent state-of-the-art Anomaly Detection (AD) algorithm designed for functional data. It relies on a tree partition procedure where an abnormality score is computed by projecting each curve observation on a drawn dictionary through a linear inner product. Such linear inner product and the dictionary are a priori choices that highly influence the algorithm's performances and might lead to unreliable results, particularly with complex datasets. This work addresses these challenges by introducing \textit{Signature Isolation Forest}, a novel AD algorithm class leveraging the rough path theory's signature transform. Our objective is to remove the constraints imposed by FIF through the proposition of two algorithms which specifically target the linearity of the FIF inner product and the choice of the dictionary. We provide several numerical experiments, including a real-world applications benchmark showing the relevance of our methods.
- LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, volume 29, pages 93–104. ACM.
- Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3):15:1–15:58.
- Chen, K.-T. (1958). Integration of paths–a faithful representation of paths by noncommutative formal power series. Transactions of the American Mathematical Society, 89(2):395–407.
- The UCR time series classification archive.
- A primer on the signature method in machine learning. arXiv preprint arXiv:1603.03788.
- Signature moments to characterize laws of stochastic processes. The Journal of Machine Learning Research, 23(1):7928–7969.
- Multivariate functional halfspace depth. Journal of the American Statistical Association, 109(505):411–423.
- Affine invariant integrated rank-weighted statistical depth: properties and finite sample analysis. Electronic Journal of Statistics, 17(2):3854–3892.
- Beyond mahalanobis distance for textual ood detection. Advances in Neural Information Processing Systems, 35:17744–17759.
- Toward stronger textual attack detectors. arXiv preprint arXiv:2310.14001.
- Robust estimation and classification for functional data via projection-based depth notions. Computational Statistics, 22(3):481–496.
- A functional data perspective and baseline on multi-layer out-of-distribution detection. arXiv preprint arXiv:2306.03522.
- Unsupervised layer-wise score aggregation for textual ood detection. arXiv preprint arXiv:2302.09852.
- Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. Journal of multivariate analysis, 12(1):136–154.
- Fermanian, A. (2021). Embedding and learning with signatures. Computational Statistics & Data Analysis, 157:107148.
- Ferraty, F. (2006). Nonparametric functional data analysis. Springer.
- Trimmed means for functional data. Test, 10:419–440.
- Multidimensional stochastic processes as rough paths: theory and applications, volume 120. Cambridge University Press.
- On a General Definition of Depth for Functional Data. Statistical Science, 32(4):630 – 639.
- Enhanced hallucination detection in neural machine translation through simple detector aggregation. arXiv preprint arXiv:2402.13331.
- Multivariate functional outlier detection. Statistical Methods & Applications, 24(2):177–202.
- Kernels for sequentially ordered data. Journal of Machine Learning Research, 20.
- Zonoid trimming for multivariate distributions. The Annals of Statistics, 25(5):1998–2017.
- The signature kernel. arXiv preprint arXiv:2305.04625.
- Isolation forest. In 2008 eighth ieee international conference on data mining, pages 413–422. IEEE.
- Liu, R. Y. (1988). On a notion of simplicial depth. Proceedings of the National Academy of Sciences, 85(6):1732–1734.
- A quality index based on data depth and multivariate rank tests. Journal of the American Statistical Association, 88(421):252–260.
- On the concept of depth for functional data. Journal of the American statistical Association, pages 718–734.
- A half-region depth for functional data. Computational Statistics & Data Analysis, 55(4):1679–1695.
- Signature methods in machine learning. arXiv preprint arXiv:2206.14674.
- Differential equations driven by rough paths. Springer.
- The signature-based model for early detection of sepsis from electronic health records in the intensive care unit. In 2019 Computing in Cardiology (CinC), pages Page–1. IEEE.
- A Topologically Valid Definition of Depth for Functional Data. Statistical Science, 31(1):61 – 79.
- A halfspace-mass depth-based method for adversarial attack detection. Transactions on Machine Learning Research.
- Polonik, W. (1997). Minimum volume sets and generalized quantile processes. Stochastic processes and their applications, 69(1):1–24.
- Fitting differential equations to functional data: Principal differential analysis. Springer.
- Reutenauer, C. (2003). Free lie algebras. In Handbook of algebra, volume 3, pages 887–903. Elsevier.
- Support vector machine for functional data classification. Neurocomputing, 69(7-9):730–742.
- A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212–223.
- Regression depth. Journal of the American Statistical Association, 94(446):388–402.
- The signature kernel is the solution of a goursat pde. SIAM Journal on Mathematics of Data Science, 3(3):873–899.
- Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443–1471.
- Estimating the support of a high-dimensional distribution. Neural computation, 13(7):1443–1471.
- Shang, H. L. (2014). A survey of functional principal component analysis. AStA Advances in Statistical Analysis, 98:121–142.
- Staerman, G. (2022). Functional anomaly detection and robust estimation. PhD thesis, Institut polytechnique de Paris.
- Functional anomaly detection: a benchmark study. International Journal of Data Science and Analytics, 16(1):101–117.
- When ot meets mom: Robust estimation of wasserstein distance. In International Conference on Artificial Intelligence and Statistics, pages 136–144. PMLR.
- The area of the convex hull of sampled curves: a robust functional statistical depth measure. In International Conference on Artificial Intelligence and Statistics, pages 570–579. PMLR.
- Functional isolation forest. In Asian Conference on Machine Learning, pages 332–347. PMLR.
- A pseudo-metric between probability distributions based on depth-trimmed regions. arXiv preprint arXiv:2103.12711.
- Tukey, J. (1975). Mathematics and picturing data. pages 523–531. Canadian Math. Congress.
- The multivariate l 1-median and associated data depth. Proceedings of the National Academy of Sciences, 97(4):1423–1426.
- Functional data analysis. Annual Review of Statistics and its application, 3:257–295.
- A path signature approach to online arabic handwriting recognition. In 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), pages 135–139. IEEE.
- Deepwriterid: An end-to-end online text-independent writer identification system. IEEE Intelligent Systems, 31(2):45–53.
- Dropsample: A new training method to enhance deep convolutional neural networks for large-scale unconstrained handwritten chinese character recognition. Pattern Recognition, 58:190–203.
- General notions of statistical depth function. The Annals of Statistics, 28(2):461–482.