Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 65 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 182 tok/s Pro
GPT OSS 120B 458 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Rethinking Programmed I/O for Fast Devices, Cheap Cores, and Coherent Interconnects (2409.08141v3)

Published 12 Sep 2024 in cs.AR and cs.OS

Abstract: Conventional wisdom holds that an efficient interface between an OS running on a CPU and a high-bandwidth I/O device should use Direct Memory Access (DMA) to offload data transfer, descriptor rings for buffering and queuing, and interrupts for asynchrony between cores and device. In this paper we question this wisdom in the light of two trends: modern and emerging cache-coherent interconnects like CXL3.0, and workloads, particularly microservices and serverless computing. Like some others before us, we argue that the assumptions of the DMA-based model are obsolete, and in many use-cases programmed I/O, where the CPU explicitly transfers data and control information to and from a device via loads and stores, delivers a more efficient system. However, we push this idea much further. We show, in a real hardware implementation, the gains in latency for fine-grained communication achievable using an open cache-coherence protocol which exposes cache transitions to a smart device, and that throughput is competitive with DMA over modern interconnects. We also demonstrate three use-cases: fine-grained RPC-style invocation of functions on an accelerator, offloading of operators in a streaming dataflow engine, and a network interface targeting serverless functions, comparing our use of coherence with both traditional DMA-style interaction and a highly-optimized implementation using memory-mapped programmed I/O over PCIe.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.