Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unsupervised Anomaly Detection with Rejection (2305.13189v2)

Published 22 May 2023 in cs.LG

Abstract: Anomaly detection aims at detecting unexpected behaviours in the data. Because anomaly detection is usually an unsupervised task, traditional anomaly detectors learn a decision boundary by employing heuristics based on intuitions, which are hard to verify in practice. This introduces some uncertainty, especially close to the decision boundary, that may reduce the user trust in the detector's predictions. A way to combat this is by allowing the detector to reject examples with high uncertainty (Learning to Reject). This requires employing a confidence metric that captures the distance to the decision boundary and setting a rejection threshold to reject low-confidence predictions. However, selecting a proper metric and setting the rejection threshold without labels are challenging tasks. In this paper, we solve these challenges by setting a constant rejection threshold on the stability metric computed by ExCeeD. Our insight relies on a theoretical analysis of such a metric. Moreover, setting a constant threshold results in strong guarantees: we estimate the test rejection rate, and derive a theoretical upper bound for both the rejection rate and the expected prediction cost. Experimentally, we show that our method outperforms some metric-based methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Accuracy rejection normalized-cost curves (arnccs): A novel 3-dimensional framework for robust classification. IEEE Access, 7:160125–160143, 2019.
  2. C. C. Aggarwal. An introduction to outlier analysis. In Outlier analysis, pages 1–34. Springer, 2017.
  3. F. Angiulli and C. Pizzuti. Fast outlier detection in high dimensional spaces. In European conference on principles of data mining and knowledge discovery, pages 15–27. Springer, 2002.
  4. Isolation-based anomaly detection using nearest-neighbor ensembles. Computational Intelligence, 34(4):968–998, 2018.
  5. Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93–104, 2000.
  6. Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):1–58, 2009.
  7. Classification with rejection based on cost-sensitive classification. In International Conference on Machine Learning, pages 1507–1517. PMLR, 2021.
  8. Autoencoder-based network anomaly detection. In 2018 Wireless telecommunications symposium (WTS), pages 1–5. IEEE, 2018.
  9. C. Chow. On optimum recognition error and reject tradeoff. IEEE Transactions on information theory, 16(1):41–46, 1970.
  10. Probability of default estimation, with a reject option. In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pages 439–448. IEEE, 2020.
  11. An ensemble of rejecting classifiers for anomaly detection of audio events. In 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, pages 76–81. IEEE, 2012.
  12. Learning with rejection. In International Conference on Algorithmic Learning Theory. Springer, 2016.
  13. J. Demšar. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine learning research, 7:1–30, 2006.
  14. C. Denis and M. Hebiri. Consistency of plug-in confidence sets for classification in semi-supervised learning. Journal of Nonparametric Statistics, 32(1):42–72, 2020.
  15. Unsupervised model selection for variational disentangled representation learning. arXiv preprint arXiv:1905.12614, 2019.
  16. R. El-Yaniv et al. On the foundations of noise-free selective classification. Journal of Machine Learning Research, 11(5), 2010.
  17. P. I. Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
  18. Y. Geifman and R. El-Yaniv. Selectivenet: A deep neural network with an integrated reject option. In International conference on machine learning, pages 2151–2159. PMLR, 2019.
  19. Anomaly detection in video via self-supervised and multi-task learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12742–12752, 2021.
  20. N. Goix. How to evaluate the quality of unsupervised anomaly detection algorithms? arXiv preprint arXiv:1607.01152, 2016.
  21. M. Goldstein and A. Dengel. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. KI-2012: poster and demo track, 9, 2012.
  22. Unsupervised anomaly detection. In IJCAI, pages 1624–1628, 2007.
  23. Adbench: Anomaly detection benchmark. In Neural Information Processing Systems (NeurIPS), 2022.
  24. B. Hanczar. Performance visualization spaces for classification with rejection option. Pattern Recognition, 96:106984, 2019.
  25. Machine learning with a reject option: A survey. arXiv preprint arXiv:2107.11277, 2021.
  26. Self-supervised anomaly detection: A survey and outlook. arXiv preprint arXiv:2205.05173, 2022.
  27. Self-adaptive training: beyond empirical risk minimization. Advances in neural information processing systems, 33:19365–19376, 2020.
  28. E. Hüllermeier and W. Waegeman. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning, 110(3):457–506, 2021.
  29. H. Jmila and M. I. Khedher. Adversarial machine learning for network intrusion detection: A comparative study. Computer Networks, page 109073, 2022.
  30. Rejecting motion outliers for efficient crowd anomaly detection. IEEE Transactions on Information Forensics and Security, 14(2):541–556, 2018.
  31. Safepredict: A meta-algorithm for machine learning that uses refusals to guarantee correctness. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2):663–678, 2019.
  32. Second opinion needed: communicating uncertainty in medical machine learning. NPJ Digital Medicine, 4(1):4, 2021.
  33. Active learning with abstaining classifiers for imbalanced drifting data streams. In 2019 IEEE international conference on big data (big data), pages 2334–2343. IEEE, 2019.
  34. Interpreting and unifying outlier scores. In Proceedings of the 2011 SIAM International Conference on Data Mining, pages 13–24. SIAM, 2011.
  35. How to define a rejection class based on model learning? In 2020 25th International Conference on Pattern Recognition (ICPR), pages 569–576. IEEE, 2021.
  36. Outlier detection with kernel density functions. In International Workshop on Machine Learning and Data Mining in Pattern Recognition, pages 61–75. Springer, 2007.
  37. Cutpaste: Self-supervised learning for anomaly detection and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9664–9674, 2021.
  38. Pac-wrap: Semi-supervised pac anomaly detection. arXiv preprint arXiv:2205.10798, 2022a.
  39. Copod: copula-based outlier detection. In 2020 IEEE International Conference on Data Mining (ICDM), pages 1118–1123. IEEE, 2020.
  40. Ecod: Unsupervised outlier detection using empirical cumulative distribution functions. IEEE Transactions on Knowledge and Data Engineering, 2022b.
  41. Probabilistic robust autoencoders for outlier detection. arXiv preprint arXiv:2110.00494, 2021.
  42. Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(1):1–39, 2012.
  43. A large-scale study on unsupervised outlier model selection: Do internal strategies suffice? arXiv preprint arXiv:2104.01422, 2021.
  44. An empirical comparison of ideal and empirical roc-based reject rules. In International Workshop on Machine Learning and Data Mining in Pattern Recognition, pages 47–60. Springer, 2007.
  45. Anomaly detection based on sensor data in petroleum industry applications. Sensors, 2015.
  46. V.-L. Nguyen and E. Hullermeier. Reliable multilabel classification: Prediction with partial abstention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5264–5271, 2020.
  47. Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2041–2050, 2018.
  48. A ranking stability measure for quantifying the robustness of anomaly detection methods. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 397–408. Springer, 2020a.
  49. Quantifying the confidence of anomaly detectors in their example-wise predictions. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 227–243. Springer, 2020b.
  50. Transferring the contamination factor between anomaly detection domains by shape similarity. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 4128–4136, 2022.
  51. Estimating the contamination factor’s distribution in unsupervised anomaly detection. In International Conference on Machine Learning, pages 27668–27679. PMLR, 2023a.
  52. How to allocate your label budget? choosing between active learning and learning to reject in anomaly detection. arXiv preprint arXiv:2301.02909, 2023b.
  53. T. Pevnỳ. Loda: Lightweight on-line detector of anomalies. Machine Learning, 102(2):275–304, 2016.
  54. A. Pugnana and S. Ruggieri. Auc-based selective classification. In International Conference on Artificial Intelligence and Statistics, pages 2494–2514. PMLR, 2023.
  55. Neural transformation learning for deep anomaly detection beyond images. In International Conference on Machine Learning, pages 8703–8714. PMLR, 2021.
  56. S. Rayana and L. Akoglu. Less is more: Building selective anomaly ensembles. Acm transactions on knowledge discovery from data (tkdd), 10(4):1–33, 2016.
  57. Deep semi-supervised anomaly detection. arXiv preprint arXiv:1906.02694, 2019.
  58. Estimating the support of a high-dimensional distribution. Neural computation, 13(7):1443–1471, 2001.
  59. Ssd: A unified framework for self-supervised outlier detection. arXiv preprint arXiv:2103.12051, 2021.
  60. Binary classification with bounded abstention rate. arXiv preprint arXiv:1905.09561, 2019.
  61. T. Shenkar and L. Wolf. Anomaly detection for tabular data with internal contrastive learning. In International Conference on Learning Representations, 2021.
  62. The effect of hyperparameter tuning on the comparative evaluation of unsupervised anomaly detection methods. In Proceedings of the KDD, volume 21, pages 1–9, 2021.
  63. A reject option for automated sleep stage scoring. In Workshop on Interpretable ML in Healthcare at International Conference on Machine Learning (ICML), 2021.
  64. Condition monitoring and anomaly detection of wind turbine based on cascaded and bidirectional deep learning networks. Applied Energy, 305:117925, 2022.
  65. A review on gas turbine anomaly detection for implementing health management. Turbo Expo: Power for Land, Sea, and Air, 2016.
  66. Pyod: A python toolbox for scalable outlier detection. Journal of Machine Learning Research, 20(96):1–7, 2019. URL http://jmlr.org/papers/v20/19-011.html.
  67. Ensembles for unsupervised outlier detection: challenges and research questions a position paper. Acm Sigkdd Explorations Newsletter, 15(1):11–22, 2014.
Citations (4)

Summary

We haven't generated a summary for this paper yet.