OptIForest: Optimal Isolation Forest for Anomaly Detection (2306.12703v2)
Abstract: Anomaly detection plays an increasingly important role in various fields for critical tasks such as intrusion detection in cybersecurity, financial risk detection, and human health monitoring. A variety of anomaly detection methods have been proposed, and a category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest is often employed as a state-of-the-art detector for real deployment. While the majority of isolation forests use the binary structure, a framework LSHiForest has demonstrated that the multi-fork isolation tree structure can lead to better detection performance. However, there is no theoretical work answering the fundamentally and practically important question on the optimal tree structure for an isolation forest with respect to the branching factor. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash which enables more information to be learned from data for better isolation quality. The rationale of our approach relies on a better bias-variance trade-off achieved by bias reduction in OptIForest. Extensive experiments on a series of benchmarking datasets for comparative and ablation studies demonstrate that our approach can efficiently and robustly achieve better detection performance in general than the state-of-the-arts including the deep learning based methods.
- Theoretical foundations and algorithms for outlier ensembles. Acm sigkdd explorations newsletter, 17(1):24–47, 2015.
- A survey of network anomaly detection techniques. Journal of Network and Computer Applications, 60:19–31, 2016.
- Efficient anomaly detection by isolation using nearest neighbour ensemble. In IEEE International conference on data mining workshop (ICDM workshop), pages 698–705. IEEE, 2014.
- Lsh forest: self-tuning indexes for similarity search. In Proceedings of the 14th international conference on World Wide Web (WWW), pages 651–660, 2005.
- Anomaly detection: A survey. ACM computing surveys, 41(3):1–58, 2009.
- Antibenford subgraphs: Unsupervised anomaly detection in financial networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 2762–2770, 2022.
- Outlier detection with autoencoder ensembles. In Proceedings of the 2017 SIAM international conference on data mining (SDM), pages 90–98. SIAM, 2017.
- Deep learning for medical anomaly detection–a survey. ACM Computing Surveys, 54(7):1–37, 2021.
- Adbench: Anomaly detection benchmark. Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Extended isolation forest. IEEE Transactions on Knowledge and Data Engineering, 33(4):1479–1489, 2019.
- Real-time nonparametric anomaly detection in high-dimensional settings. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
- Ecod: Unsupervised outlier detection using empirical cumulative distribution functions. IEEE Transactions on Knowledge and Data Engineering, 2022.
- Isolation forest. In IEEE international conference on data mining (ICDM), pages 413–422, 2008.
- On detecting clustered anomalies using sciforest. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pages 274–290. Springer, 2010.
- Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data, 6(1):1–39, 2012.
- Generative adversarial active learning for unsupervised outlier detection. IEEE Transactions on Knowledge and Data Engineering, 32(8):1517–1528, 2019.
- Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (SIGKDD), pages 2041–2050, 2018.
- Deep anomaly detection with deviation networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (SIGKDD), pages 353–362, 2019.
- Deep learning for anomaly detection: A review. ACM Computing Surveys, 54(2):1–38, 2021.
- Toward deep supervised anomaly detection: Reinforcement learning from partially labeled anomaly data. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining (SIGKDD), pages 1298–1308, 2021.
- A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE, 109(5):756–795, 2021.
- KG Russell. Estimating the value of e by simulation. The American Statistician, 45(1):66–68, 1991.
- A survey on learning to hash. IEEE transactions on pattern analysis and machine intelligence, 40(4):769–790, 2017.
- Unsupervised representation learning by predicting random distances. In International Joint Conference on Artificial Intelligence (IJCAI), pages 2950–2956, 2021.
- Deep isolation forest for anomaly detection. arXiv preprint arXiv:2206.06602, 2022.
- Reconstruction by inpainting for visual anomaly detection. Pattern Recognition, 112:107706, 2021.
- Meta-aad: Active anomaly detection with deep reinforcement learning. In IEEE International conference on data mining (ICDM), pages 771–780. IEEE, 2020.
- Lshiforest: A generic framework for fast tree isolation based ensemble anomaly analysis. In IEEE international conference on data engineering (ICDE), pages 983–994, 2017.
- Elite: Robust deep anomaly detection with meta gradient. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (SIGKDD), pages 2174–2182, 2021.
- Subsampling for efficient and effective unsupervised outlier detection ensembles. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (SIGKDD), pages 428–436, 2013.
- Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In International conference on learning representations (ICLR), 2018.