Lens: A Foundation Model for Network Traffic (2402.03646v4)
Abstract: Network traffic refers to the amount of data being sent and received over the internet or any system that connects computers. Analyzing and understanding network traffic is vital for improving network security and management. However, the analysis of network traffic is challenging due to the diverse nature of data packets, which often feature heterogeneous headers and encrypted payloads lacking semantics. To capture the latent semantics of traffic, a few studies have adopted pre-training techniques based on the Transformer encoder or decoder to learn the representations from massive traffic data. However, these methods typically excel in traffic understanding (classification) or traffic generation tasks. To address this issue, we develop Lens, a foundation model for network traffic that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data. Harnessing the strength of the encoder-decoder framework, which captures the global information while preserving the generative ability, our model can better learn the representations from raw data. To further enhance pre-training effectiveness, we design a novel loss that combines three distinct tasks: Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP). Evaluation results across various benchmark datasets demonstrate that the proposed Lens outperforms the baselines in most downstream tasks related to both traffic understanding and generation. Notably, it also requires much less labeled data for fine-tuning compared to current methods.
- Adaptive encrypted traffic fingerprinting with bi-directional dependence. In Proceedings of the 32nd Annual Conference on Computer Security Applications, pages 177–188, 2016.
- A tool for the generation of realistic network workload for emerging networking scenarios. Computer Networks, 56(15):3531–3547, 2012.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Generating representative, live network traffic out of millions of code repositories. In Proceedings of the 21st ACM Workshop on Hot Topics in Networks, pages 1–7, 2022.
- isanon: Flow-based anonymity network traffic identification using extreme gradient boosting. In 2019 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2019.
- J. Daemen and V. Rijmen. Aes proposal: Rijndael. 1999.
- Glads: A global-local attention data selection model for multimodal multitask encrypted traffic classification of iot. Computer Networks, 225:109652, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Characterization of encrypted and vpn traffic using time-related. In Proceedings of the 2nd international conference on information systems security and privacy (ICISSP), pages 407–414, 2016.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Characterization of tor traffic using time based features. In Proceedings of the 3rd International Conference on Information Systems Security and Privacy - ICISSP, pages 253–262. INSTICC, SciTePress, 2017.
- Pert: Payload encoding representation from transformer for encrypted traffic classification. In 2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K), pages 1–8. IEEE, 2020.
- Network simulations with the ns-3 simulator. SIGCOMM demonstration, 14(14):527, 2008.
- Knowledge enhanced gan for iot traffic generation. In Proceedings of the ACM Web Conference 2022, pages 3336–3346, 2022.
- T. Kudo and J. Richardson. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226, 2018.
- M. Lacage and T. R. Henderson. Yet another network simulator. In Proceedings of the 2006 Workshop on ns-3, pages 12–es, 2006.
- Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.
- Tscrnn: A novel classification scheme of encrypted traffic based on flow spatiotemporal features for efficient management of iiot. Computer Networks, 190:107974, 2021.
- Et-bert: A contextualized datagram representation with pre-training transformers for encrypted traffic classification. In Proceedings of the ACM Web Conference 2022, pages 633–642, 2022.
- Using gans for sharing networked time series data: Challenges, initial promise, and open questions. In Proceedings of the ACM Internet Measurement Conference, pages 464–483, 2020.
- Fs-net: A flow sequence network for encrypted traffic classification. In IEEE INFOCOM 2019-IEEE Conference On Computer Communications, pages 1171–1179. IEEE, 2019.
- I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Computing, 24(3):1999–2012, 2020.
- Netgpt: Generative pretrained transformer for network traffic. arXiv preprint arXiv:2304.09513, 2023.
- Detection of doh tunnels using time-series classification of encrypted traffic. In 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pages 63–70. IEEE, 2020.
- Synthetic flow-based cryptomining attack generation through generative adversarial networks. Scientific reports, 12(1):2091, 2022.
- Ciciot2023: A real-time dataset and benchmark for large-scale attacks in iot environment. 2023.
- Website fingerprinting at internet scale. In NDSS, 2016.
- Improving language understanding by generative pre-training. 2018.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Flow-based network traffic generation using generative adversarial networks. Computers & Security, 82:156–172, 2019.
- R. Rivest. The md5 message-digest algorithm. Technical report, 1992.
- Deep fingerprinting: Undermining website fingerprinting defenses with deep learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 1928–1943, 2018.
- Harpoon: a flow-level traffic generator for router and network tests. ACM SIGMETRICS Performance Evaluation Review, 32(1):392–392, 2004.
- Robust smartphone app identification via encrypted network traffic analysis. IEEE Transactions on Information Forensics and Security, 13(1):63–78, 2017.
- Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traffic. In Network and distributed system security symposium (NDSS), volume 27, 2020.
- K. V. Vishwanath and A. Vahdat. Swing: Realistic and responsive network traffic generation. IEEE/ACM Transactions on Networking, 17(3):712–725, 2009.
- Datanet: Deep learning based encrypted network traffic classification in sdn home gateway. IEEE Access, 6:55380–55391, 2018.
- Malware traffic classification using convolutional neural network for representation learning. In 2017 International conference on information networking (ICOIN), pages 712–717. IEEE, 2017.
- Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859, 2021.
- k𝑘kitalic_k-nearest neighbor augmented neural networks for text classification. arXiv preprint arXiv:1708.07863, 2017.
- Identification of encrypted traffic through attention mechanism based long short term memory. IEEE Transactions on Big Data, 8(1):241–252, 2019.
- Practical gan-based synthetic ip header trace generation using netshare. In Proceedings of the ACM SIGCOMM 2022 Conference, pages 458–472, 2022.
- Yet another traffic classifier: a masked autoencoder based traffic transformer with multi-level flow representation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5420–5427, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.