Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NetBench: A Large-Scale and Comprehensive Network Traffic Benchmark Dataset for Foundation Models (2403.10319v2)

Published 15 Mar 2024 in cs.NI and cs.CR

Abstract: In computer networking, network traffic refers to the amount of data transmitted in the form of packets between internetworked computers or Cyber-Physical Systems. Monitoring and analyzing network traffic is crucial for ensuring the performance, security, and reliability of a network. However, a significant challenge in network traffic analysis is to process diverse data packets including both ciphertext and plaintext. While many methods have been adopted to analyze network traffic, they often rely on different datasets for performance evaluation. This inconsistency results in substantial manual data processing efforts and unfair comparisons. Moreover, some data processing methods may cause data leakage due to improper separation of training and testing data. To address these issues, we introduce the NetBench, a large-scale and comprehensive benchmark dataset for assessing machine learning models, especially foundation models, in both network traffic classification and generation tasks. NetBench is built upon seven publicly available datasets and encompasses a broad spectrum of 20 tasks, including 15 classification tasks and 5 generation tasks. Furthermore, we evaluate eight State-Of-The-Art (SOTA) classification models (including two foundation models) and two generative models using our benchmark. The results show that foundation models significantly outperform the traditional deep learning methods in traffic classification. We believe NetBench will facilitate fair comparisons among various approaches and advance the development of foundation models for network traffic. Our benchmark is available at https://github.com/WM-JayLab/NetBench.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Characterization of encrypted and vpn traffic using time-related. In Proceedings of the 2nd international conference on information systems security and privacy (ICISSP), pages 407–414, 2016.
  2. Characterization of tor traffic using time based features. In Proceedings of the 3rd International Conference on Information Systems Security and Privacy - ICISSP, pages 253–262. INSTICC, SciTePress, 2017.
  3. Netdiffusion: Network data augmentation through protocol-constrained traffic generation. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 8(1):1–32, 2024.
  4. Tscrnn: A novel classification scheme of encrypted traffic based on flow spatiotemporal features for efficient management of iiot. Computer Networks, 190:107974, 2021.
  5. Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification. In Proceedings of the ACM Web Conference 2022, pages 633–642, 2022.
  6. Mampf: Encrypted traffic classification based on multi-attribute markov probability fingerprints. In IWQoS, pages 1–10. IEEE, 2018.
  7. Fs-net: A flow sequence network for encrypted traffic classification. In IEEE INFOCOM 2019-IEEE Conference On Computer Communications, pages 1171–1179. IEEE, 2019.
  8. Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Computing, 24(3):1999–2012, 2020.
  9. Ugr ‘16: A new dataset for the evaluation of cyclostationarity-based network idss. Computers & Security, 73:411–424, 2018.
  10. Detection of doh tunnels using time-series classification of encrypted traffic. In 2020 Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pages 63–70. IEEE, 2020.
  11. Ciciot2023: A real-time dataset and benchmark for large-scale attacks in iot environment. 2023.
  12. Netresec. Splitcap - a fast pcap file splitter, 2024.
  13. Computer network traffic prediction: a comparison between traditional and deep learning neural networks. International Journal of Big Data Intelligence, 3(1):28–37, 2016.
  14. Ip2vec: Learning similarities between ip addresses. In ICDMW, pages 657–666. IEEE, 2017.
  15. Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traffic. In Network and distributed system security symposium (NDSS), volume 27, 2020.
  16. Datanet: Deep learning based encrypted network traffic classification in sdn home gateway. IEEE Access, 6:55380–55391, 2018.
  17. Malware traffic classification using convolutional neural network for representation learning. In 2017 International conference on information networking (ICOIN), pages 712–717. IEEE, 2017.
  18. Stan: Synthetic network traffic generation with generative neural models. In International Workshop on Deployable Machine Learning for Security Defense, pages 3–29. Springer, 2021.
  19. Identification of encrypted traffic through attention mechanism based long short term memory. IEEE Transactions on Big Data, 8(1):241–252, 2019.
  20. Practical gan-based synthetic ip header trace generation using netshare. In Proceedings of the ACM SIGCOMM 2022 Conference, pages 458–472, 2022.
  21. Dual-track protocol reverse analysis based on share learning. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications, pages 51–60. IEEE, 2022.
  22. Yet another traffic classifier: a masked autoencoder based traffic transformer with multi-level flow representation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5420–5427, 2023.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com