Papers
Topics
Authors
Recent
Search
2000 character limit reached

Feature Distribution Shift Mitigation with Contrastive Pretraining for Intrusion Detection

Published 23 Apr 2024 in cs.LG, cs.AI, and cs.NI | (2404.15382v1)

Abstract: In recent years, there has been a growing interest in using Machine Learning (ML), especially Deep Learning (DL) to solve Network Intrusion Detection (NID) problems. However, the feature distribution shift problem remains a difficulty, because the change in features' distributions over time negatively impacts the model's performance. As one promising solution, model pretraining has emerged as a novel training paradigm, which brings robustness against feature distribution shift and has proven to be successful in Computer Vision (CV) and NLP. To verify whether this paradigm is beneficial for NID problem, we propose SwapCon, a ML model in the context of NID, which compresses shift-invariant feature information during the pretraining stage and refines during the finetuning stage. We exemplify the evidence of feature distribution shift using the Kyoto2006+ dataset. We demonstrate how pretraining a model with the proper size can increase robustness against feature distribution shifts by over 8%. Moreover, we show how an adequate numerical embedding strategy also enhances the performance of pretrained models. Further experiments show that the proposed SwapCon model also outperforms eXtreme Gradient Boosting (XGBoost) and K-Nearest Neighbor (KNN) based models by a large margin.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. B. Mukherjee, L. Heberlein, and K. Levitt, “Network Intrusion Detection,” IEEE Network, vol. 8, no. 3, pp. 26–41, 1994.
  2. Cyber-Edge Group. ”2022 Cyberthreat Defense Report”. Accessed: September 13, 2023. [Online]. Available: https://cyber-edge.com/resources/2022-cyberthreat-defense-report/
  3. D. Hendrycks, K. Lee, and M. Mazeika, “Using Pre-training Can Improve Model Robustness and Uncertainty,” in International Conference on Machine Learning.   PMLR, 2019, pp. 2712–2721.
  4. J. Song, H. Takakura, Y. Okabe, M. Eto, D. Inoue, and K. Nakao, “Statistical Analysis of Honeypot Data and Building of Kyoto 2006+ Dataset for NIDS Evaluation,” in Proceedings of the first workshop on building analysis datasets and gathering experience returns for security, 2011, pp. 29–36.
  5. S. Long, F. Cao, S. C. Han, and H. Yang, “Vision-and-language Pretrained Models: A Survey,” arXiv preprint arXiv:2204.07356, 2022.
  6. G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Goldstein, “Saint: Improved Neural Networks for Tabular Data Via Row Attention and Contrastive Pre-training,” arXiv preprint arXiv:2106.01342, 2021.
  7. S. S. Dhaliwal, A.-A. Nahid, and R. Abbas, “Effective Intrusion Detection System Using XGBoost,” Information, vol. 9, no. 7, p. 149, 2018.
  8. M. Drăgoi, E. Burceanu, E. Haller, A. Manolache, and F. Brad, “AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection,” arXiv preprint arXiv:2206.15476, 2022.
  9. P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang, A. Balsubramani, W. Hu, M. Yasunaga, R. L. Phillips, I. Gao et al., “Wilds: A Benchmark of In-the-wild Distribution Shifts,” in International Conference on Machine Learning.   PMLR, 2021, pp. 5637–5664.
  10. H. Guo, B. Chen, R. Tang, W. Zhang, Z. Li, and X. He, “An Embedding Learning Framework for Numerical Features in CTR Prediction,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 2910–2918.
  11. Y. Gorishniy, I. Rubachev, and A. Babenko, “On Embeddings for Numerical Features in Tabular Deep Learning,” arXiv preprint arXiv:2203.05556, 2022.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.