Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dimensionality-Aware Outlier Detection: Theoretical and Experimental Analysis (2401.05453v2)

Published 10 Jan 2024 in cs.LG and cs.AI

Abstract: We present a nonparametric method for outlier detection that takes full account of local variations in intrinsic dimensionality within the dataset. Using the theory of Local Intrinsic Dimensionality (LID), our 'dimensionality-aware' outlier detection method, DAO, is derived as an estimator of an asymptotic local expected density ratio involving the query point and a close neighbor drawn at random. The dimensionality-aware behavior of DAO is due to its use of local estimation of LID values in a theoretically-justified way. Through comprehensive experimentation on more than 800 synthetic and real datasets, we show that DAO significantly outperforms three popular and important benchmark outlier detection methods: Local Outlier Factor (LOF), Simplified LOF, and kNN.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Z. Ahmad, A. S. Khan, C. W. Shiang, J. Abdullah, and F. Ahmad, “Network intrusion detection system: A systematic study of machine learning and deep learning approaches,” Trans. Emerg. Telecommun. Technol., vol. 32, no. 1, 2021.
  2. Z. Alaverdyan, J. Jung, R. Bouet, and C. Lartizien, “Regularized siamese neural network for unsupervised outlier detection on brain multiparametric magnetic resonance imaging: Application to epilepsy lesion screening,” Medical Image Anal., vol. 60, 2020.
  3. L. Amsaleg, J. Bailey, A. Barbe, S. M. Erfani, T. Furon, M. E. Houle, M. Radovanovic, and X. V. Nguyen, “High intrinsic dimensionality facilitates adversarial attack: Theoretical evidence,” IEEE Trans. Inf. Forensics Secur., vol. 16, pp. 854–865, 2021.
  4. L. Amsaleg, O. Chelly, T. Furon, S. Girard, M. E. Houle, K. Kawarabayashi, and M. Nett, “Extreme-value-theoretic estimation of local intrinsic dimensionality,” Data Min. Knowl. Discov., vol. 32, no. 6, pp. 1768–1805, 2018.
  5. L. Amsaleg, O. Chelly, M. E. Houle, K. Kawarabayashi, M. Radovanović, and W. Treeratanajaru, “Intrinsic dimensionality estimation within tight localities,” in Proc. SDM, 2019, pp. 181–189.
  6. ——, “Intrinsic dimensionality estimation within tight localities: A theoretical and experimental analysis,” arXiv, no. 2209.14475, 2022.
  7. A. Anderberg, J. Bailey, R. J. G. B. Campello, M. E. Houle, H. O. Marques, M. Radovanović, and A. Zimek, “Dimensionality-aware outlier detection,” in Proc. SDM, 2024.
  8. F. Angiulli and C. Pizzuti, “Fast outlier detection in high dimensional spaces,” in Proc. PKDD, 2002, pp. 15–26.
  9. L. Anselin, “Local indicators of spatial association–LISA,” Geograph. Anal., vol. 27, no. 2, pp. 93–115, 1995.
  10. M. Aumüller and M. Ceccarello, “The role of local dimensionality measures in benchmarking nearest neighbor search,” Inf. Syst., vol. 101, p. 101807, 2021.
  11. J. Bac, E. M. Mirkes, A. N. Gorban, I. Tyukin, and A. Y. Zinovyev, “Scikit-dimension: A python package for intrinsic dimension estimation,” Entropy, vol. 23, no. 10, p. 1368, 2021.
  12. J. Bailey, M. E. Houle, and X. Ma, “Local intrinsic dimensionality, entropy and statistical divergences,” Entropy, vol. 24, no. 9, p. 1220, 2022.
  13. V. Barnett, “The study of outliers: Purpose and model,” Appl. Stat., vol. 27, no. 3, pp. 242–250, 1978.
  14. K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is “nearest neighbor” meaningful?” in Proc. ICDT, 1999, pp. 217–235.
  15. M. M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander, “LOF: Identifying density-based local outliers,” in Proc. SIGMOD, 2000, pp. 93–104.
  16. G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent, and M. E. Houle, “On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study,” Data Min. Knowl. Disc., vol. 30, pp. 891–927, 2016.
  17. G. Casanova, E. Englmeier, M. E. Houle, P. Kröger, M. Nett, E. Schubert, and A. Zimek, “Dimensional testing for reverse k𝑘kitalic_k-nearest neighbor search,” PVLDB, vol. 10, no. 7, pp. 769–780, 2017.
  18. A. Emmott, S. Das, T. Dietterich, A. Fern, and W.-K. Wong, “A meta-analysis of the anomaly detection problem,” arXiv, no. 1503.01158, 2016.
  19. E. Facco, M. d’Errico, A. Rodriguez, and A. Laio, “Estimating the intrinsic dimension of datasets by a minimal neighborhood information,” Scientific Reports, vol. 7, no. 12140, 2017.
  20. M. Goldstein and S. Uchida, “A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data,” PLoS ONE, vol. 11, no. 4, 2016.
  21. S. Han, X. Hu, H. Huang, M. Jiang, and Y. Zhao, “Adbench: Anomaly detection benchmark,” in NeurIPS, 2022.
  22. B. M. Hill, “A simple general approach to inference about the tail of a distribution,” Annals Stat., vol. 3, no. 5, pp. 1163–1174, 1975.
  23. M. E. Houle, “Dimensionality, discriminability, density and distance distributions,” in Proc. ICDM Workshops, 2013, pp. 468–473.
  24. ——, “Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications,” in Proc. SISAP, 2017, pp. 64–79.
  25. M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek, “Can shared-neighbor distances defeat the curse of dimensionality?” in Proc. SSDBM, 2010, pp. 482–500.
  26. M. E. Houle, E. Schubert, and A. Zimek, “On the correlation between local intrinsic dimensionality and outlierness,” in Proc. SISAP, 2018, pp. 177–191.
  27. M. E. Houle, “Local intrinsic dimensionality II: multivariate analysis and distributional support,” in Proc. SISAP, 2017, pp. 80–95.
  28. ——, “Local intrinsic dimensionality III: density and similarity,” in Proc. SISAP, 2020, pp. 248–260.
  29. W. Jin, A. K. H. Tung, J. Han, and W. Wang, “Ranking outliers using symmetric neighborhood relationship,” in Proc. PAKDD, 2006, pp. 577–593.
  30. S. Kandanaarachchi, M. A. Muñoz, R. J. Hyndman, and K. Smith-Miles, “On normalization and algorithm selection for unsupervised outlier detection,” Data Min. Knowl. Discov., vol. 34, no. 2, pp. 309–354, 2020.
  31. D. R. Karger and M. Ruhl, “Finding nearest neighbors in growth-restricted metrics,” in Proc. STOC, 2002, pp. 741–750.
  32. E. M. Knorr and R. T. Ng, “A unified notion of outliers: Properties and computation,” in Proc. KDD, 1997, pp. 219–222.
  33. H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek, “LoOP: local outlier probabilities,” in Proc. CIKM, 2009, pp. 1649–1652.
  34. ——, “Interpreting and unifying outlier scores,” in Proc. SDM, 2011, pp. 13–24.
  35. H.-P. Kriegel, M. Schubert, and A. Zimek, “Angle-based outlier detection in high-dimensional data,” in Proc. KDD, 2008, pp. 444–452.
  36. L. J. Latecki, A. Lazarevic, and D. Pokrajac, “Outlier detection with kernel density functions,” in Proc. MLDM, 2007, pp. 61–75.
  37. E. Levina and P. J. Bickel, “Maximum likelihood estimation of intrinsic dimension,” in Proc. NIPS, 2004, pp. 777–784.
  38. X. Ma, Y. Wang, M. E. Houle, S. Zhou, S. M. Erfani, S. Xia, S. N. R. Wijewickrema, and J. Bailey, “Dimensionality-driven learning with noisy labels,” in Proc. ICML, 2018, pp. 3361–3370.
  39. H. O. Marques, R. J. G. B. Campello, J. Sander, and A. Zimek, “Internal evaluation of unsupervised outlier detection,” ACM Trans. Knowl. Discov. Data, vol. 14, no. 4, pp. 47:1–47:42, 2020.
  40. H. O. Marques, L. Swersky, J. Sander, R. J. G. B. Campello, and A. Zimek, “On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles,” Data Min. Knowl. Discov., 2023.
  41. P. A. P. Moran, “Notes on continuous stochastic phenomena,” Biometrika, vol. 37, no. 1/2, pp. 17–23, 1950.
  42. S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos, “LOCI: Fast outlier detection using the local correlation integral,” in Proc. ICDE, 2003, pp. 315–326.
  43. M. Radovanović, A. Nanopoulos, and M. Ivanović, “Reverse nearest neighbors in unsupervised distance-based outlier detection,” IEEE TKDE, 2014.
  44. S. Ramaswamy, R. Rastogi, and K. Shim, “Efficient algorithms for mining outliers from large data sets,” in Proc. SIGMOD, 2000, pp. 427–438.
  45. D. T. Ramotsoela, A. M. Abu-Mahfouz, and G. P. Hancke, “A survey of anomaly detection in industrial wireless sensor networks with critical water system infrastructure as a case study,” Sensors, vol. 18, no. 8, p. 2491, 2018.
  46. S. Rayana, “ODDS library,” 2016. [Online]. Available: http://odds.cs.stonybrook.edu
  47. S. Romano, O. Chelly, V. Nguyen, J. Bailey, and M. E. Houle, “Measuring dependency via intrinsic dimensionality,” in Proc. ICPR, 2016, pp. 1207–1212.
  48. E. Schubert, A. Zimek, and H.-P. Kriegel, “Generalized outlier detection with flexible kernel density estimates,” in Proc. SDM, 2014, pp. 542–550.
  49. ——, “Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection,” Data Min. Knowl. Disc., vol. 28, no. 1, pp. 190–237, 2014.
  50. J. Tang, Z. Chen, A. W.-C. Fu, and D. W. Cheung, “Enhancing effectiveness of outlier detections for low density patterns,” in Proc. PAKDD, 2002, pp. 535–548.
  51. P. Tempczyk, R. Michaluk, L. Garncarek, P. Spurek, J. Tabor, and A. Golinski, “LIDL: local intrinsic dimension estimation using approximate likelihood,” in Proc. ICML, 2022, pp. 21 205–21 231.
  52. K. Zhang, M. Hutter, and H. Jin, “A new local distance-based outlier detection approach for scattered real-world data,” in Proc. PAKDD, 2009, pp. 813–822.
  53. A. Zimek, E. Schubert, and H.-P. Kriegel, “A survey on unsupervised outlier detection in high-dimensional numerical data,” Stat. Anal. Data Min., vol. 5, no. 5, pp. 363–387, 2012.
  54. A. Zimek and P. Filzmoser, “There and back again: Outlier detection between statistical reasoning and data mining algorithms,” WIREs Data Mining Knowl. Discov., vol. 8, no. 6, 2018.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets