Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Performance Prediction of On-NIC Network Functions with Multi-Resource Contention and Traffic Awareness (2405.05529v4)

Published 9 May 2024 in cs.NI

Abstract: Network function (NF) offloading on SmartNICs has been widely used in modern data centers, offering benefits in host resource saving and programmability. Co-running NFs on the same SmartNICs can cause performance interference due to contention of onboard resources. To meet performance SLAs while ensuring efficient resource management, operators need mechanisms to predict NF performance under such contention. However, existing solutions lack SmartNIC-specific knowledge and exhibit limited traffic awareness, leading to poor accuracy for on-NIC NFs. This paper proposes Yala, a novel performance predictive system for on-NIC NFs. Yala builds upon the key observation that co-located NFs contend for multiple resources, including onboard accelerators and the memory subsystem. It also facilitates traffic awareness according to the behaviors of individual resources to maintain accuracy as the external traffic attributes vary. Evaluation using BlueField-2 SmartNICs shows that Yala improves the prediction accuracy by 78.8% and reduces SLA violations by 92.2% compared to state-of-the-art approaches, and enables new practical usecases.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. RDMA Active Queue Pair Operations. https://docs.nvidia.com/networking/display/rdmaawareprogrammingv17/rdma+active+queue+pair+operations.
  2. The (de)compression accelerator on NVIDIA BlueField-2 SmartNIC. https://docs.nvidia.com/networking/display/BlueFieldDPUOSLatest/Compression+Acceleration.
  3. The RegEx accelerator on NVIDIA BlueField-2 SmartNIC. https://docs.nvidia.com/networking/display/BlueFieldDPUOSLatest/RegEx+Acceleration.
  4. NVIDIA NICs Performance Report with DPDK 23.07. https://fast.dpdk.org/doc/perf/DPDK_23_07_NVIDIA_NIC_performance_report.pdf, 2021.
  5. Performance Tuning for Mellanox Aadpters. https://enterprise-support.nvidia.com/s/article/performance-tuning-for-mellanox-adapters, 2022.
  6. DOCA Documentation v1.5.0 LTS. https://docs.nvidia.com/doca/archive/doca-v1.5.0/, 2023.
  7. Dpdk. https://www.dpdk.org, 2023.
  8. L7-Filter. https://l7-filter.sourceforge.net/, 2023.
  9. mlx-regex. https://github.com/Mellanox/mlx-regex, 2023.
  10. NVIDIA Bluefield-2 DPU. https://resources.nvidia.com/en-us-accelerated-networking-resource-library/bluefield-2-dpu-datasheet, 2023.
  11. NVIDIA Bluefield-3 DPU. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/documents/datasheet-nvidia-bluefield-3-dpu.pdf, 2023.
  12. Nvidia Developer Forums. https://forums.developer.nvidia.com/t/performance-counters-for-accelerators/247086, 2023.
  13. Pensando Distributed Services Architecture SmartNIC. https://www.servethehome.com/pensando-distributed-services-architecture-smartnic/, 2023.
  14. Anonymous code repository. https://anonymous.4open.science/r/Tomur-1EF5/, 2024.
  15. EXREX. https://github.com/asciimoo/exrex, 2024.
  16. Intel CAT. https://github.com/intel/intel-cmt-cat, 2024.
  17. Internet protocol (ip) pipeline application. https://doc.dpdk.org/guides20.11/sample_app_ug/ip_pipeline.html, 2024.
  18. Memory Bandwidth Benchmark. https://github.com/raas/mbw, 2024.
  19. Perf-tools. https://github.com/brendangregg/perf-tools, 2024.
  20. Rxpbench. https://docs.nvidia.com/doca/archive/doca-v1.5.0/rxpbench/index.html, 2024.
  21. Scikit-learn. https://scikit-learn.org/stable/index.html, 2024.
  22. Stress-ng. https://github.com/ColinIanKing/stress-ng, 2024.
  23. The Pktgen Application. https://pktgen-dpdk.readthedocs.io/en/latest/, 2024.
  24. Disaggregating Stateful Network Functions. In Proc. USENIX NSDI, 2023.
  25. Prophet: Precise qos prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers. In Proc. ACM ASPLOS, 2017.
  26. Parties: Qos-aware resource partitioning for multiple interactive services. In Proc. ACM ASPLOS, 2019.
  27. Approximation algorithms for bin packing: a survey. page 46–93, 1996.
  28. Toward predictable performance in software packet-processing platforms. In Proc. USENIX NSDI, 2012.
  29. The click modular router. ACM Transactions on Computer Systems, 18(3):263–297, 2000.
  30. Azure Accelerated Networking: SmartNICs in the public cloud. In Proc. USENIX NSDI, 2018.
  31. SmartNIC Performance Isolation with FairNIC: Programmable Networking for the Cloud. In Proc. ACM SIGCOMM, 2020.
  32. Lognic: A high-level performance model for smartnics. In Proc. IEEE/ACM MICRO, 2023.
  33. Design Guidelines for High Performance RDMA Systems. In Proc. USENIX ATC, 2016.
  34. ExoPlane: An Operating System for On-Rack Switch Resource Augmentation. In Proc. USENIX NSDI, 2023.
  35. Picnic: Predictable virtualized nic. In Proc. ACM SIGCOMM, 2019.
  36. UNO: Uniflying host and Smart NIC offload for flexible packet processing. In Proc. ACM SoCC, 2017.
  37. ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware. In Proc. ACM SIGCOMM, 2016.
  38. PANIC: A High-Performance programmable NIC for multi-tenant networks. In Proc. USENIX OSDI, 2020.
  39. Performance Characteristics of the BlueField-2 SmartNIC. In Arxiv, 2021.
  40. Offloading Distributed Applications onto SmartNICs Using IPipe. In Proc. ACM SIGCOMM, 2019.
  41. E3: Energy-Efficient Microservices on SmartNIC-Accelerated Servers. In Proc. USENIX ATC, 2019.
  42. Contention-aware performance prediction for virtualized network functions. In Proc. ACM SIGCOMM, 2020.
  43. Heterogeneity in “homogeneous” warehouse-scale computers: A performance opportunity. IEEE Computer Architecture Letters, 10(2):29–32, 2011.
  44. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proc. IEEE/ACM MICRO, 2011.
  45. Domain specific run time optimization for software data planes. In Proc. ACM ASPLOS, 2022.
  46. The queuing-first approach for tail management of interactive services. IEEE Micro, 39(4):55–64, 2019.
  47. Metron: Nfv service chains at the true speed of the underlying hardware. In Proc. USENIX NSDI, 2018.
  48. NetBricks: Taking the v out of NFV. In Proc. USENIX OSDI, 2016.
  49. Automated SmartNIC Offloading Insights for Network Functions. In Proc. ACM SOSP, 2021.
  50. Robert R. Schaller. Moore’s law: Past, present, and future. IEEE Spectr., 34(6):52–59, 1997.
  51. Disaggregating and Consolidating Network Functionalities with SuperNIC. In Arxiv, 2022.
  52. FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism. In Proc. USENIX NSDI, 2022.
  53. Meili: Enabling smartnic as a service in the cloud. In Arxiv, 2024.
  54. NFP: Enabling network function parallelism in NFV. In Proc. ACM SIGCOMM, 2017.
  55. An In-depth Look at the Intel IPU E2000. In Proc. IEEE ISSCC, 2023.
  56. ResQ: Enabling SLOs in network function virtualization. In Proc. USENIX NSDI, 2018.
  57. Characterizing off-path SmartNIC for accelerating distributed systems. In Proc. USENIX OSDI, 2023.
  58. Andrew Chi-Chih Yao. New algorithms for bin packing. Journal of the ACM, 27(2):207–227, 1980.
  59. nn-meter: Towards accurate latency prediction of deep-learning model inference on diverse edge devices. In Proc. ACM MobiSys, 2021.
  60. OpenNetVM: A Platform for High Performance Network Service Chains. In Proc.  ACM HotMIddlebox, 2016.
  61. Octans: Optimal placement of service function chains in many-core systems. In Proc. IEEE INFOCOM, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com