Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dealing with Imbalanced Classes in Bot-IoT Dataset (2403.18989v1)

Published 27 Mar 2024 in cs.CR and cs.AI

Abstract: With the rapidly spreading usage of Internet of Things (IoT) devices, a network intrusion detection system (NIDS) plays an important role in detecting and protecting various types of attacks in the IoT network. To evaluate the robustness of the NIDS in the IoT network, the existing work proposed a realistic botnet dataset in the IoT network (Bot-IoT dataset) and applied it to machine learning-based anomaly detection. This dataset contains imbalanced normal and attack packets because the number of normal packets is much smaller than that of attack ones. The nature of imbalanced data may make it difficult to identify the minority class correctly. In this thesis, to address the class imbalance problem in the Bot-IoT dataset, we propose a binary classification method with synthetic minority over-sampling techniques (SMOTE). The proposed classifier aims to detect attack packets and overcome the class imbalance problem using the SMOTE algorithm. Through numerical results, we demonstrate the proposed classifier's fundamental characteristics and the impact of imbalanced data on its performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Network Intrusion Detection System: A Systematic Study of Machine Learning and Deep Learning Approaches. Transactions on Emerging Telecommunications Technologies, 32(1):e4150, 2021.
  2. A Survey of Network Anomaly Detection Techniques. Journal of Network and Computer Applications, 60:19–31, January 2016.
  3. A Detection and Prevention Technique for Man in the Middle Attack in Fog Computing. Procedia Computer Science, 141:24–31, January 2018.
  4. The UCI KDD Archive. KDD Cup 1999 Data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, 2022. Accessed 14 Jan. 2022.
  5. Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg, 2006.
  6. Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001.
  7. Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
  8. Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In Thanaruk Theeramunkong, Boonserm Kijsirikul, Nick Cercone, and Tu-Bao Ho, editors, Proc. of Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, pages 475–482, Berlin, Heidelberg, 2009. Springer.
  9. Network Intrusion Detection for IoT Security Based on Learning Techniques. IEEE Communications Surveys & Tutorials, 21(3):2671–2701, 2019.
  10. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3), may 2011.
  11. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357, Jun 2002.
  12. XGBoost: A Scalable Tree Boosting System. In Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA, August 2016. Association for Computing Machinery.
  13. G. Cybenko. Approximation by Superpositions of a Sigmoidal Function. Mathematics of Control, Signals and Systems, 2(4):303–314, December 1989.
  14. Survey on Anomaly Detection of (IoT)- Internet of Things Cyberattacks Using Machine Learning. In Proc. of 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), pages 115–117, March 2020.
  15. Liblinear: A library for large linear classification. J. Mach. Learn. Res., 9:1871–1874, jun 2008.
  16. Karl Pearson F.R.S. X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302):157–175, 1900.
  17. A Survey on Intrusion Detection System Using Machine Learning Algorithms. In Jennifer S. Raj, Abul Bashar, and S. R. Jino Ramson, editors, Proc. of Innovative Data Communication Technologies and Application, Lecture Notes on Data Engineering and Communications Technologies, pages 670–675, Cham, 2020. Springer International Publishing.
  18. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In De-Shuang Huang, Xiao-Ping Zhang, and Guang-Bin Huang, editors, Proc. of Advances in Intelligent Computing, Lecture Notes in Computer Science, pages 878–887, Berlin, Heidelberg, 2005. Springer.
  19. A Survey on IoT Security: Application Areas, Security Threats, and Solution Architectures. IEEE Access, 7:82721–82743, 2019.
  20. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. In Proc. of 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pages 1322–1328, June 2008.
  21. Survey of Intrusion Detection Systems: Techniques, Datasets and Challenges. Cybersecurity, 2(1):20, July 2019.
  22. A Survey on IOT and 5G Network. In Proc. of 2018 International Conference on Smart City and Emerging Technology (ICSCET), pages 1–3, January 2018.
  23. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset. Future Generation Computer Systems, 100:779–796, 2019.
  24. A Survey of Deep Learning-based Network Anomaly Detection. Cluster Computing, 22(1):949–961, January 2019.
  25. A Survey on Addressing High-Class Imbalance in Big Data. Journal of Big Data, 5(1):42, November 2018.
  26. Intrusion Detection System: A Comprehensive Review. Journal of Network and Computer Applications, 36(1):16–24, January 2013.
  27. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set). In Proc. of 2015 Military Communications and Information Systems Conference (MilCIS), pages 1–6, November 2015.
  28. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
  29. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
  30. John Platt. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Technical Report MSR-TR-98-14, Microsoft, April 1998.
  31. Iot security: Botnet detection in iot using machine learning, 2021.
  32. Scalable Machine Learning-based Intrusion Detection System for IoT-enabled Smart Cities. Sustainable Cities and Society, 61:102324, October 2020.
  33. Exact Greedy Algorithm based Split Finding Approach for Intrusion Detection in Fog-enabled IoT Environment. Journal of Information Security and Applications, 60:102866, August 2021.
  34. Raúl Rojas. Neural Networks: A Systematic Introduction. Springer Science & Business Media, 2013.
  35. Brian C. Ross. Mutual Information between Discrete and Continuous Data Sets. PLOS ONE, 9(2):e87357, February 2014.
  36. Artificial Intelligence: A Modern Approach. Prentice Hall Press, USA, 3rd edition, 2009.
  37. Deep Learning Approach-Based Network Intrusion Detection System for Fog-Assisted IoT. In Shailesh Tiwari, Erma Suryani, Andrew Keong Ng, K. K. Mishra, and Nitin Singh, editors, Proc. of International Conference on Big Data, Machine Learning and Their Applications, Lecture Notes in Networks and Systems, pages 39–50, Singapore, 2021. Springer.
  38. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proc. of the 4th International Conference on Information Systems Security and Privacy - ICISSP,, pages 108–116. INSTICC, SciTePress, 2018.
  39. Toward Developing a Systematic Approach to Generate Benchmark Datasets for Intrusion Detection. Computers & Security, 31(3):357–374, May 2012.
  40. A Detailed Analysis of the KDD CUP 99 Data Set. In Proc. of 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pages 1–6, July 2009.
  41. A Review on Machine Learning and Deep Learning Perspectives of IDS for IoT: Recent Updates, Security Issues, and Challenges. Archives of Computational Methods in Engineering, 28(4):3211–3243, June 2021.
  42. Experimental Perspectives on Learning from Imbalanced Data. In Proc. of the 24th International Conference on Machine Learning - ICML ’07, pages 935–942, Corvalis, Oregon, 2007. ACM Press.
  43. Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for Quality Classification. In Proc. of 2016 IEEE 6th International Conference on Advanced Computing (IACC), pages 78–83, February 2016.
  44. Stochastic Gradient Boosted Distributed Decision Trees. In Proc. of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09, page 2061–2064, New York, NY, USA, 2009. Association for Computing Machinery.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jesse Atuhurra (8 papers)
  2. Takanori Hara (6 papers)
  3. Yuanyu Zhang (13 papers)
  4. Masahiro Sasabe (3 papers)
  5. Shoji Kasahara (10 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.