Feature Distribution Shift Mitigation with Contrastive Pretraining for Intrusion Detection (2404.15382v1)
Abstract: In recent years, there has been a growing interest in using Machine Learning (ML), especially Deep Learning (DL) to solve Network Intrusion Detection (NID) problems. However, the feature distribution shift problem remains a difficulty, because the change in features' distributions over time negatively impacts the model's performance. As one promising solution, model pretraining has emerged as a novel training paradigm, which brings robustness against feature distribution shift and has proven to be successful in Computer Vision (CV) and NLP. To verify whether this paradigm is beneficial for NID problem, we propose SwapCon, a ML model in the context of NID, which compresses shift-invariant feature information during the pretraining stage and refines during the finetuning stage. We exemplify the evidence of feature distribution shift using the Kyoto2006+ dataset. We demonstrate how pretraining a model with the proper size can increase robustness against feature distribution shifts by over 8%. Moreover, we show how an adequate numerical embedding strategy also enhances the performance of pretrained models. Further experiments show that the proposed SwapCon model also outperforms eXtreme Gradient Boosting (XGBoost) and K-Nearest Neighbor (KNN) based models by a large margin.
- B. Mukherjee, L. Heberlein, and K. Levitt, “Network Intrusion Detection,” IEEE Network, vol. 8, no. 3, pp. 26–41, 1994.
- Cyber-Edge Group. ”2022 Cyberthreat Defense Report”. Accessed: September 13, 2023. [Online]. Available: https://cyber-edge.com/resources/2022-cyberthreat-defense-report/
- D. Hendrycks, K. Lee, and M. Mazeika, “Using Pre-training Can Improve Model Robustness and Uncertainty,” in International Conference on Machine Learning. PMLR, 2019, pp. 2712–2721.
- J. Song, H. Takakura, Y. Okabe, M. Eto, D. Inoue, and K. Nakao, “Statistical Analysis of Honeypot Data and Building of Kyoto 2006+ Dataset for NIDS Evaluation,” in Proceedings of the first workshop on building analysis datasets and gathering experience returns for security, 2011, pp. 29–36.
- S. Long, F. Cao, S. C. Han, and H. Yang, “Vision-and-language Pretrained Models: A Survey,” arXiv preprint arXiv:2204.07356, 2022.
- G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Goldstein, “Saint: Improved Neural Networks for Tabular Data Via Row Attention and Contrastive Pre-training,” arXiv preprint arXiv:2106.01342, 2021.
- S. S. Dhaliwal, A.-A. Nahid, and R. Abbas, “Effective Intrusion Detection System Using XGBoost,” Information, vol. 9, no. 7, p. 149, 2018.
- M. Drăgoi, E. Burceanu, E. Haller, A. Manolache, and F. Brad, “AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection,” arXiv preprint arXiv:2206.15476, 2022.
- P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang, A. Balsubramani, W. Hu, M. Yasunaga, R. L. Phillips, I. Gao et al., “Wilds: A Benchmark of In-the-wild Distribution Shifts,” in International Conference on Machine Learning. PMLR, 2021, pp. 5637–5664.
- H. Guo, B. Chen, R. Tang, W. Zhang, Z. Li, and X. He, “An Embedding Learning Framework for Numerical Features in CTR Prediction,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 2910–2918.
- Y. Gorishniy, I. Rubachev, and A. Babenko, “On Embeddings for Numerical Features in Tabular Deep Learning,” arXiv preprint arXiv:2203.05556, 2022.