Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Application-Defined Receive Side Dispatching on the NIC (2312.04857v2)

Published 8 Dec 2023 in cs.NI and cs.AR

Abstract: Application layer (L7) processing is increasingly implemented in proxies (e.g., Envoy) to simplify administration and management. However, prior work has observed that this reduces application performance and increases resource requirements. The reason is that moving logic out of the application required duplicating some computation and additional inter-process communication. This paper describes QingNiao, a system that moves L7 dispatch (a function implemented by all L7 proxies and affects all messages received by an application) to a NIC that is on the application's communication path. Unfortunately, the data formats and protocols used by modern applications pose a challenge when moving L7 dispatch to NICs. Consequently, when designing QingNiao we had to rethink not just the NIC hardware, but also how applications encode data sent over the network. We prototyped QingNiao using a 100GbE FPGA NIC, and show that for real-world applications QingNiao can achieve 6.6x to 7.15x higher throughput compared to software proxies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. https://github.com/multitenancy-project.
  2. AMD Xilinx Alveo U250 Data Center Accelerator Card. https://www.xilinx.com/products/boards-and-kits/alveo/u250.html.
  3. AMD Xilinx AXI4-Stream. https://www.xilinx.com/products/intellectual-property/axi4-stream_interconnect.html.
  4. Better Load Balancing: Real-Time Dynamic Subsetting. https://www.uber.com/blog/better-load-balancing-real-time-dynamic-subsetting/.
  5. https://cilium.io/.
  6. Corundum Github Repository. https://github.com/corundum/corundum/commits/56fe10f27d9b42f1ff9abe4d735b113008e4be9d.
  7. Data Plane Development Kit. https://www.dpdk.org/.
  8. Envoy Proxy. https://www.envoyproxy.io/.
  9. Envoy String Matcher. https://www.envoyproxy.io/docs/envoy/latest/api-v3/type/matcher/v3/string.proto#string-matcher-proto.
  10. Fast HTTP Package for Go. https://github.com/valyala/fasthttp.
  11. FreePDK45. https://www.eda.ncsu.edu/wiki/FreePDK45:Contents.
  12. Github Repository of RingLeader. [email protected]:utnslab/RingleaderNIC.git.
  13. Github Repository of RocksDB. https://github.com/facebook/rocksdb.
  14. Google Anthos Service Mesh. https://cloud.google.com/anthos/service-mesh.
  15. Google’s PSP cryptographic hardware offload at scale is now open source. https://cloud.google.com/blog/products/identity-security/announcing-psp-security-protocol-is-now-open-source.
  16. Introduction to gRPC. https://grpc.io/docs/what-is-grpc/introduction/.
  17. Istio Performance and Scalability. https://istio.io/latest/docs/ops/deployment/performance-and-scalability/.
  18. Istio Service Mesh. https://istio.io/.
  19. Linkerd: The world’s most advanced service mesh. https://linkerd.io/.
  20. Microservices on AWS. https://aws.amazon.com/microservices/.
  21. Netflix Ribbon. https://github.com/Netflix/ribbon.
  22. NGINX HTTP Load Balancing. https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/.
  23. NVIDIA Mellanox ConnectX-5 Adapters. https://www.nvidia.com/en-us/networking/ethernet/connectx-5/.
  24. Nvidia Mellanox OFED RDMA Libraries. https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/.
  25. Nvidia Mellanox RSS Support. https://docs.nvidia.com/networking/display/mlnxenv23100550/rss+hash+function.
  26. Protobuf. https://protobuf.dev/.
  27. Receive Side Scaling. https://www.kernel.org/doc/Documentation/networking/scaling.txt.
  28. Receive Side Scaling on Intel Network Adapters. https://www.intel.com/content/www/us/en/support/articles/000006703/ethernet-products.html.
  29. Regular Expression. https://en.wikipedia.org/wiki/Regular_expression.
  30. [RFC 7540] Hypertext Transfer Protocol Version 2 (HTTP/2). https://www.rfc-editor.org/rfc/rfc7540.
  31. [RFC 793] Transmission Control Protocol. https://www.ietf.org/rfc/rfc793.txt.
  32. [RFC 9000] QUIC: A UDP-Based Multiplexed and Secure Transport. https://www.rfc-editor.org/rfc/rfc9000.html.
  33. [RFC 9114] HTTP/3. https://www.rfc-editor.org/rfc/rfc9114.html.
  34. Synapse: A transparent service discovery framework for connecting an SOA . https://airbnb.io/projects/synapse/.
  35. Synopsys DC Ultra. https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/dc-ultra.html.
  36. Toeplitz Hash Algorithm. https://en.wikipedia.org/wiki/Toeplitz_Hash_Algorithm.
  37. Type-Length-Value (TLV) Encoding. https://en.wikipedia.org/wiki/Type–length–value.
  38. Virtual Function I/O. https://docs.kernel.org/driver-api/vfio.html.
  39. wrk - A Modern HTTP Benchmarking Tool. https://github.com/wg/wrk.
  40. Xilinx UltraScale Architecture Memory Resources User Guide (UG573). https://docs.xilinx.com/v/u/en-US/ug573-ultrascale-memory-resources.
  41. Enabling Programmable Transport Protocols in High-Speed NICs. In USENIX NSDI, 2020.
  42. Nanotransport: A low-latency, programmable transport layer for nics. In ACM SOSR, 2021.
  43. M. Becchi and P. Crowley. A Hybrid Finite Automaton for Practical Deep Packet Inspection. In ACM CoNEXT, 2007.
  44. Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN. In ACM SIGCOMM, 2013.
  45. Architectural considerations for a new generation of protocols. In ACM SIGCOMM, 1990.
  46. When Idling is Ideal: Optimizing Tail-Latency for Heavy-Tailed Datacenter Workloads with Perséphone. In ACM SOSP, 2021.
  47. Experiences with a High-Speed Network Adaptor: A Software Perspective. In ACM SIGCOMM, 1994.
  48. Azure Accelerated Networking: SmartNICs in the Public Cloud. In USENIX NSDI, 2018.
  49. Corundum: An Open-Source 100-Gbps NIC. In IEEE FCCM, 2020.
  50. Caladan: Mitigating Interference at Microsecond Timescales. In USENIX OSDI, 2020.
  51. Yoda: A Highly Available Layer-7 Load Balancer. In ACM EuroSys, 2016.
  52. Design Principles for Packet Parsers. In ACM/IEEE ANCS, 2013.
  53. VL2: A Scalable and Flexible Data Center Network. In ACM SIGCOMM, 2009.
  54. A Specialized Architecture for Object Serialization with Applications to Big Data Analytics. In ACM ISCA, 2020.
  55. Yama: Providing Performance Isolation for Black-Box Offloads. In ACM SoCC, 2023.
  56. In-Datacenter Performance Analysis of a Tensor Processing Unit. In ACM ISCA, 2017.
  57. Shinjuku: Preemptive Scheduling for μ𝜇\muitalic_μsecond-scale Tail Latency. In USENIX NSDI, 2019.
  58. Profiling a Warehouse-Scale Computer. In ACM ISCA, 2015.
  59. A Hardware Accelerator for Protocol Buffers. In IEEE MICRO, 2021.
  60. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In USENIX NSDI, 2014.
  61. RingLeader: Efficiently Offloading Intra-Server Orchestration to NICs. In USENIX NSDI, 2023.
  62. P. Linz and S. H. Rodger. An introduction to formal languages and automata. Jones & Bartlett Learning, 2022.
  63. Offloading Distributed Applications onto SmartNICs Using IPipe. In ACM SIGCOMM, 2019.
  64. Homa: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities. In ACM SIGCOMM, 2018.
  65. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads. In USENIX NSDI, 2019.
  66. J. Ousterhout. It’s Time to Replace TCP in the Datacenter. https://arxiv.org/abs/2210.00714, 2022.
  67. Autonomous NIC Offloads. In ACM ASPLOS, 2021.
  68. Optimus Prime: Accelerating Data Transformation in Servers. In ACM ASPLOS, 2020.
  69. SPRIGHT: Extracting the Server from Serverless Computing! High-Performance EBPF-Based Event-Driven, Shared-Memory Processing. In ACM SIGCOMM, 2022.
  70. ServiceRouter: Hyperscale and Minimal Cost Service Mesh at Meta. In USENIX OSDI, 2023.
  71. 1RMA: Re-Envisioning Remote Memory Access for Multi-Tenant Datacenters. In ACM SIGCOMM, 2020.
  72. Programmable Packet Scheduling at Line Rate. In ACM SIGCOMM, 2016.
  73. TCP is Harmful to In-Network Computing: Designing a Message Transport Protocol (MTP). In ACM HotNets, 2021.
  74. Isolation Mechanisms for High-Speed Packet-Processing Pipelines. In USENIX NSDI, 2022.
  75. Shuhai: Benchmarking High Bandwidth Memory On FPGAS. In IEEE FCCM, 2020.
  76. SRNIC: A Scalable Architecture for RDMA NICs. In USENIX NSDI, 2023.
  77. Achieving 100Gbps Intrusion Prevention on a Single Server. In USENIX OSDI, 2020.
  78. Dissecting Overheads of Service Mesh Sidecars. In ACM SoCC, 2023.

Summary

We haven't generated a summary for this paper yet.