Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors (2402.16388v3)
Abstract: The requirement of uncertainty quantification for anomaly detection systems has become increasingly important. In this context, effectively controlling Type I error rates ($\alpha$) without compromising the statistical power ($1-\beta$) of these systems can build trust and reduce costs related to false discoveries. The field of conformal anomaly detection emerges as a promising approach for providing respective statistical guarantees by model calibration. However, the dependency on calibration data poses practical limitations - especially within low-data regimes. In this work, we formally define and evaluate leave-one-out-, bootstrap-, and cross-conformal methods for anomaly detection, incrementing on methods from the field of conformal prediction. Looking beyond the classical inductive conformal anomaly detection, we demonstrate that derived methods for calculating resampling-conformal $p$-values strike a practical compromise between statistical efficiency (full-conformal) and computational efficiency (split-conformal) as they make more efficient use of available data. We validate derived methods and quantify their improvements for a range of one-class classifiers and datasets.
- Aggarwal, C. C. Outlier Analysis. Springer, 2013. ISBN 978-1-4614-6396-2. URL http://dx.doi.org/10.1007/978-1-4614-6396-2.
- Survey on Anomaly Detection using Data Mining Techniques. Procedia Computer Science, 60:708–713, 2015. ISSN 1877-0509. doi: 10.1016/j.procs.2015.08.220. URL http://dx.doi.org/10.1016/j.procs.2015.08.220.
- Andrews, D. W. K. Stability Comparison of Estimators. Econometrica, 54(5):1207–1235, 1986. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/1912329. p. 1.
- A Gentle Introduction to Conformal Prediction and Distribution-free Uncertainty Quantification. CoRR, abs/2107.07511, 2021. URL https://arxiv.org/abs/2107.07511. pp. 14–15.
- Predictive Inference with the Jackknife+. The Annals of Statistics, 49(1), February 2021. ISSN 0090-5364. doi: 10.1214/20-aos1965. URL http://dx.doi.org/10.1214/20-AOS1965.
- Testing for Outliers with Conformal p-Values. The Annals of Statistics, 51(1):149 – 178, 2023. doi: 10.1214/22-AOS2244. URL https://doi.org/10.1214/22-AOS2244.
- Principles and Practice of Explainable Machine Learning. Frontiers in Big Data, 4, July 2021. ISSN 2624-909X. doi: 10.3389/fdata.2021.688969. URL http://dx.doi.org/10.3389/fdata.2021.688969.
- Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300, 1995. ISSN 00359246. doi: 10.2307/2346101. URL http://dx.doi.org/10.2307/2346101.
- The Control of the False Discovery Rate in Multiple Testing under Dependency. The Annals of Statistics, 29(4), August 2001. ISSN 0090-5364. doi: 10.1214/aos/1013699998. URL http://dx.doi.org/10.1214/aos/1013699998. p. 1168.
- Selective Inference in Complex Research. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1906):4255–4271, November 2009. ISSN 1471-2962. doi: 10.1098/rsta.2009.0127. URL http://dx.doi.org/10.1098/rsta.2009.0127. pp. 4257–4259.
- Two simple sufficient Conditions for FDR Control. Electronic Journal of Statistics, 2, January 2008. ISSN 1935-7524. doi: 10.1214/08-ejs180. URL http://dx.doi.org/10.1214/08-EJS180.
- Real-time Out-of-distribution Detection in Learning-enabled Cyber-physical Systems. In 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS), pp. 174–183, 2020. doi: 10.1109/ICCPS48487.2020.00024.
- Anomaly Detection in Predictive Maintenance: A new Evaluation Framework for Temporal Unsupervised Anomaly Detection Algorithms. Neurocomputing, 462:440–452, October 2021. ISSN 0925-2312. doi: 10.1016/j.neucom.2021.07.095. URL http://dx.doi.org/10.1016/j.neucom.2021.07.095.
- Reliable Machine Learning: Applying SRE Principles to ML in Production. O’Reilly Media, Incorporated, 2022. ISBN 9781098106225. URL https://books.google.de/books?id=1rvHzgEACAAJ.
- Efron, B. Jackknife-After-Bootstrap Standard Errors and Influence Functions. Journal of the Royal Statistical Society. Series B (Methodological), 54(1):83–127, 1992. ISSN 00359246. URL http://www.jstor.org/stable/2345949.
- An Anomaly Detection Framework for Cyber-Security Data. Computers & Security, 97:101941, 2020. ISSN 0167-4048. doi: https://doi.org/10.1016/j.cose.2020.101941. URL https://www.sciencedirect.com/science/article/pii/S0167404820302170.
- Plug-in martingales for testing exchangeability on-line. In Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12, pp. 923–930, Madison, WI, USA, 2012. Omnipress. ISBN 9781450312851.
- Deep Learning for Medical Anomaly Detection – A Survey. ACM Comput. Surv., 54(7), jul 2021. ISSN 0360-0300. doi: 10.1145/3464423. URL https://doi.org/10.1145/3464423.
- Learning by Transduction. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, UAI’98, pp. 148–155, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. ISBN 155860555X.
- Prediction and Outlier Detection in Classification Problems. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):524–546, February 2022. ISSN 1467-9868. doi: 10.1111/rssb.12443. URL http://dx.doi.org/10.1111/rssb.12443.
- ADBench: Anomaly Detection Benchmark. Advances in Neural Information Processing Systems, 35:32142–32159, 2022.
- A statistical framework for efficient out of distribution detection in deep neural networks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=Oy9WeuZD51.
- Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances. Expert Systems with Applications, 193:116429, 2022. ISSN 0957-4174. doi: https://doi.org/10.1016/j.eswa.2021.116429. URL https://www.sciencedirect.com/science/article/pii/S0957417421017164.
- Conformal k𝑘kitalic_k-NN anomaly detector for univariate data streams. In Gammerman, A., Vovk, V., Luo, Z., and Papadopoulos, H. (eds.), Proceedings of the Sixth Workshop on Conformal and Probabilistic Prediction and Applications, volume 60 of Proceedings of Machine Learning Research, pp. 213–227. PMLR, 13–16 Jun 2017. URL https://proceedings.mlr.press/v60/ishimtsev17a.html.
- A novelty detection approach to classification. Proceedings of the Fourteenth Joint Conference on Artificial Intelligence, 10 1999.
- Compression-based Data Mining of Sequential Data. Data Mining and Knowledge Discovery, 14(1):99–129, 2007. pp. 1–3.
- Predictive Inference is Free with the Jackknife+-after-Bootstrap. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546. p. 3.
- Laxhammar, R. Conformal Anomaly Detection: Detecting Abnormal Trajectories in Surveillance Applications. PhD thesis, University of Skövde, Sweden, 2014. URL https://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-8762. pp. 45 – 58.
- Conformal Prediction for Distribution-independent Anomaly Detection in Streaming Vessel Data. In Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques, StreamKDD ’10, pp. 47–55, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781450302265. doi: 10.1145/1833280.1833287. URL https://doi.org/10.1145/1833280.1833287.
- Distribution-free Prediction Bands for Non-parametric Regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76(1):71–96, 07 2013. ISSN 1369-7412. doi: 10.1111/rssb.12021. URL https://doi.org/10.1111/rssb.12021.
- Isolation Forest. In 2008 Eighth IEEE International Conference on Data Mining. IEEE, December 2008. doi: 10.1109/icdm.2008.17. URL http://dx.doi.org/10.1109/ICDM.2008.17.
- Dirty Rotten Strategies. Stanford University Press, Redwood City, 2009. ISBN 9781503627260. doi: doi:10.1515/9781503627260. URL https://doi.org/10.1515/9781503627260.
- Deep Learning for Anomaly Detection: A Review. ACM Computing Surveys, 54:1–38, 03 2021. doi: 10.1145/3439950.
- Inductive Confidence Machines for Regression, pp. 345–356. Springer Berlin Heidelberg, 2002. ISBN 9783540367550. doi: 10.1007/3-540-36755-1˙29. URL http://dx.doi.org/10.1007/3-540-36755-1_29.
- Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Workshop on Novelty Detection and Adaptive System Monitoring. In NIPS, 1994. URL https://www.cs.cmu.edu/Groups/NIPS/1994/94workshops-schedule.html.
- A Review of Novelty Detection. Signal Processing, 99:215–249, 2014. ISSN 0165-1684. doi: https://doi.org/10.1016/j.sigpro.2013.12.026. URL https://www.sciencedirect.com/science/article/pii/S016516841300515X.
- Quenouille, M. H. Approximate Tests of Correlation in Time-series. Mathematical Proceedings of the Cambridge Philosophical Society, 45(3):483–484, July 1949. ISSN 1469-8064. doi: 10.1017/s0305004100025123. URL http://dx.doi.org/10.1017/s0305004100025123.
- Quenouille, M. H. Notes on Bias in Estimation. Biometrika, 43(3/4):353, December 1956. ISSN 0006-3444. doi: 10.2307/2332914. URL http://dx.doi.org/10.2307/2332914.
- Admissibility and Measurable Utility Functions. The Review of Economic Studies, 29(2):140, February 1962. ISSN 0034-6527. doi: 10.2307/2295819. URL http://dx.doi.org/10.2307/2295819.
- Schmit, S. The useful useless p-value, May 17 2023. URL https://www.geteppo.com/blog/the-useful-useless-p-value. Accessed on 27.01.2024.
- The Jackknife and Bootstrap. Springer New York, 1995. ISBN 9781461207955. doi: 10.1007/978-1-4612-0795-5. URL http://dx.doi.org/10.1007/978-1-4612-0795-5. p. 414.
- Conformal anomaly detection of trajectories with a multi-class hierarchy. In Gammerman, A., Vovk, V., and Papadopoulos, H. (eds.), Statistical Learning and Data Sciences, pp. 281–290, Cham, 2015. Springer International Publishing. ISBN 978-3-319-17091-6.
- Smuha, N. A. The EU Approach to Ethics Guidelines for Trustworthy Artificial Intelligence. Computer Law Review International, 20(4):97–106, 2019. doi: doi:10.9785/cri-2019-200402. URL https://doi.org/10.9785/cri-2019-200402.
- Leave-one-out Prediction Intervals in Linear Regression Models with many Variables. arXiv: Statistics Theory, 2016. URL https://api.semanticscholar.org/CorpusID:88514378.
- Conditional Predictive Inference for Stable Algorithms. The Annals of Statistics, 51(1), February 2023. ISSN 0090-5364. doi: 10.1214/22-aos2250. URL http://dx.doi.org/10.1214/22-AOS2250.
- Tax, D. One-class Classification. PhD thesis, Delft University of Technology, June 2001.
- Tukey, J. Bias and Confidence in not quite large Samples. Annals of Mathematical Statistics, 29:614, 1958.
- Tukey, J. W. The Problem of Multiple Comparisons. Unpublished manuscript. See Braun (1994), pp. 1-300., 1953.
- Vovk, V. Cross-conformal Predictors. Annals of Mathematics and Artificial Intelligence, 74(1–2):9–28, July 2013a. ISSN 1573-7470. doi: 10.1007/s10472-013-9368-4. URL http://dx.doi.org/10.1007/s10472-013-9368-4. p. 1.
- Vovk, V. Transductive Conformal Predictors. In 9th Artificial Intelligence Applications and Innovations (AIAI), pp. 348–360, Paphos, Greece, September 2013b. doi: 10.1007/978-3-642-41142-7˙36. URL https://hal.archives-ouvertes.fr/hal-01459630.
- Vovk, V. Testing for Concept Shift Online. ArXiv, abs/2012.14246, 2020. URL https://api.semanticscholar.org/CorpusID:229678222.
- Vovk, V. Testing Randomness Online. Statistical Science, 36(4):595–611, 2021. doi: 10.1214/20-STS817. URL https://doi.org/10.1214/20-STS817.
- Algorithmic Learning in a Random World. Springer-Verlag, Berlin, Heidelberg, 2005. ISBN 0387001522.
- Retrain or not retrain: Conformal Test Martingales for Change-point Detection. In Carlsson, L., Luo, Z., Cherubin, G., and An Nguyen, K. (eds.), Proceedings of the Tenth Symposium on Conformal and Probabilistic Prediction and Applications, volume 152 of Proceedings of Machine Learning Research, pp. 191–210. PMLR, 08–10 Sep 2021. URL https://proceedings.mlr.press/v152/vovk21b.html.
- Multiple Testing when many p-Values are Uniformly Conservative, with Application to Testing Qualitative Interaction in Educational Interventions. Journal of the American Statistical Association, 114(527):1291–1304, October 2018. ISSN 1537-274X. doi: 10.1080/01621459.2018.1497499. URL http://dx.doi.org/10.1080/01621459.2018.1497499.