Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Anomaly component analysis (2312.16139v1)

Published 26 Dec 2023 in stat.ME, cs.LG, and stat.ML

Abstract: At the crossway of machine learning and data analysis, anomaly detection aims at identifying observations that exhibit abnormal behaviour. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification and isolation constitute an important task in almost any area of industry and science. While a substantial body of literature is devoted to detection of anomalies, little attention is payed to their explanation. This is the case mostly due to intrinsically non-supervised nature of the task and non-robustness of the exploratory methods like principal component analysis (PCA). We introduce a new statistical tool dedicated for exploratory analysis of abnormal observations using data depth as a score. Anomaly component analysis (shortly ACA) is a method that searches a low-dimensional data representation that best visualises and explains anomalies. This low-dimensional representation not only allows to distinguish groups of anomalies better than the methods of the state of the art, but as well provides a -- linear in variables and thus easily interpretable -- explanation for anomalies. In a comparative simulation and real-data study, ACA also proves advantageous for anomaly analysis with respect to methods present in the literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. TEST, 24:441–461, 2015.
  2. A survey of anomaly detection techniques in financial domain. Future Generation Computer Systems, 55:278–288, 2016.
  3. Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne). Computer Science Review, 40:100378, 2021.
  4. A. Azzalini and A. Capitanio. Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):579–602, 1999.
  5. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information Fusion, 58:82–115, 2020.
  6. Confidence regions and minimax rates in outlier-robust estimation on the probability simplex. Electronic Journal of Statistics, 14:2653–2677, 2020.
  7. M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.
  8. Interpretable anomaly detection with diffi: Depth-based feature importance of isolation forest. Engineering Applications of Artificial Intelligence, 119:105730, 2023.
  9. Anomaly detection: A survey. ACM Computing Surveys, 41(3), 2009.
  10. P. Comon. Independent component analysis. In J.L.Lacoume, editor, Higher-Order Statistics, pages 29–38. Elsevier, 1992.
  11. Multidimensional Scaling, pages 315–347. Springer Berlin Heidelberg, 2008.
  12. Robust estimators in high dimensions without the computational intractability. In IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 655–664. IEEE, 2016.
  13. D. L. Donoho. Breakdown properties of multivariate location estimators. PhD thesis, Dept. Statistics, Harvard University, Boston, 1982.
  14. F. Doshi-Velez and B. Kim. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.
  15. R. Dyckerhoff. Data depths satisfying the projection property. Allgemeines Statistisches Archiv, 88:163–190, 2004.
  16. Approximate computation of projection depths. Computational Statistics & Data Analysis, 157:107166, 2021.
  17. Toward supervised anomaly detection. Journal of Artificial Intelligence Research, 46:235–262, 2013.
  18. H. Hotelling. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24:417–441, 1933.
  19. P. J. Huber. Robust estimation of a location parameter. The Annals of Mathematical Statistics, (35):73–101, 1964.
  20. P. J. Huber. A robust version of the probability ratio test. The Annals of Mathematical Statistics, (36):1753–1758, 1965.
  21. Robpca: A new approach to robust principal component analysis. Technometrics, 47(1):64–79, 2005.
  22. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2668–2677. PMLR, 2018.
  23. L. Kotík and D. Hlubinka. A weighted localization of halfspace depth and its properties. Journal of Multivariate Analysis, 157:53–69, 2017.
  24. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999.
  25. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1), 2023.
  26. Isolation forest. In 2008 eighth ieee international conference on data mining, pages 413–422. IEEE, 2008.
  27. Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics, 19(1):229–248, 1991.
  28. K. Mosler and P. Mozharovskyi. Choosing among notions of multivariate depth statistics. Statistical Science, 37(3):348–368, 2022.
  29. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019.
  30. Uniform convergence rates for the approximated halfspace and projection depth. Electronic Journal of Statistics, 14(2):3939 – 3975, 2020.
  31. A framework to learn with interpretation. Advances in Neural Information Processing Systems, 34:24273–24285, 2021.
  32. K. Pearson. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572, 1901.
  33. S. Rayana. ODDS library. https://odds.cs.stonybrook.edu, 2016. Stony Brook University, Department of Computer Sciences.
  34. Detecting deviating data cells. Technometrics, 60(2):135–145, 2018.
  35. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212–223, 1999.
  36. P. J. Rousseeuw and M. Hubert. Anomaly detection by robust statistics. WIREs Data Mining and Knowledge Discovery, 8(2):e1236, 2018.
  37. Robust Regression and Outlier Detection. John Wiley & Sons, New York, 1987.
  38. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
  39. M. Sakurada and T. Yairi. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, page 4–11. Association for Computing Machinery, 2014.
  40. B. Schölkopf and A. J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2002.
  41. Kernel principal component analysis. In International conference on artificial neural networks, pages 583–588. Springer, 1997.
  42. W. A. Stahel. Robust Estimation: Infinitesimal Optimality and Covariance Matrix Estimators (In German). PhD thesis, ETH Zurich, 1981.
  43. A comprehensive survey of anomaly detection techniques for high dimensional big data. Journal of Big Data, 7(42), 2020.
  44. L. van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008.
  45. Y. Zuo. Projection-based depth functions and associated medians. The Annals of Statistics, 31(5):1460–1490, 2003.
  46. Y. Zuo and R. Serfling. General notions of statistical depth function. The Annals of Statistics, 28:461–482, 2000.

Summary

We haven't generated a summary for this paper yet.