Fast and Scalable Channels in Kotlin Coroutines (2211.04986v1)

Published 9 Nov 2022 in cs.DS and cs.DC

Abstract: Asynchronous programming has gained significant popularity over the last decade: support for this programming pattern is available in many popular languages via libraries and native language implementations, typically in the form of coroutines or the async/await construct. Instead of programming via shared memory, this concept assumes implicit synchronization through message passing. The key data structure enabling such communication is the rendezvous channel. Roughly, a rendezvous channel is a blocking queue of size zero, so both send(e) and receive() operations wait for each other, performing a rendezvous when they meet. To optimize the message passing pattern, channels are usually equipped with a fixed-size buffer, so send(e)-s do not suspend and put elements into the buffer until its capacity is exceeded. This primitive is known as a buffered channel. This paper presents a fast and scalable algorithm for both rendezvous and buffered channels. Similarly to modern queues, our solution is based on an infinite array with two positional counters for send(e) and receive() operations, leveraging the unconditional Fetch-And-Add instruction to update them. Yet, the algorithm requires non-trivial modifications of this classic pattern, in order to support the full channel semantics, such as buffering and cancellation of waiting requests. We compare the performance of our solution to that of the Kotlin implementation, as well as against other academic proposals, showing up to 9.8x speedup. To showcase its expressiveness and performance, we also integrated the proposed algorithm into the standard Kotlin Coroutines library, replacing the previous channel implementations.

Citations (6)

View on Semantic Scholar

Summary

The paper introduces a novel algorithm that optimizes message-passing with lock-free techniques for both rendezvous and buffered channels.
The paper achieves up to a 9.8× speed improvement by leveraging atomic operations and efficient memory management in high-concurrency scenarios.
The paper demonstrates scalability in asynchronous communication while acknowledging a trade-off with strict FIFO ordering in its design.

Fast and Scalable Channels in Kotlin Coroutines: An Evaluation

The paper "Fast and Scalable Channels in Kotlin Coroutines" introduces a novel algorithmic approach for implementing efficient and scalable message-passing synchronization primitives, specifically rendezvous and buffered channels in Kotlin Coroutines. Asynchronous programming, often implemented through constructs like coroutines, necessitates efficient data structures such as channels for inter-coroutine communication. The proposed solution leverages concurrent algorithms that outperform existing implementations, showing significant speedups in practical scenarios.

Technical Contributions

The authors present an algorithm inspired by modern lock-free queue techniques, adapted specifically for channel semantics. Their approach encompasses both rendezvous and buffered channels, managing asynchronous communication with minimal overhead. The channel operations are built on the foundation of non-blocking infinite arrays with positional counters for sends and receives, adapted to support advanced features such as buffering and cancellation.

Rendezvous Channel: The rendezvous channel operates similarly to a synchronous queue with a zero-size buffer, ensuring that send and receive operations are blocking and match each other exactly in execution.
Buffered Channel: The buffered channel implements a bounded capacity buffer, allowing it to store messages until full, with excess messages prompting send operations to suspend until space becomes available.

The paper reports an impressive up to 9.8× speed improvement compared to Kotlin's native implementation and other academic algorithms by optimizing the use of atomic fetch-and-add instructions and careful memory management.

Experimental Results

The authors integrated their algorithm into the Kotlin Coroutines library, replacing existing implementations, which provided a suitable baseline for performance comparison. They evaluated their approach using a classic producer-consumer pattern, observing that their implementation scales efficiently with increasing concurrency levels. This scalability is attributed to the algorithm's ability to handle high contention without the need for global locking, a common bottleneck in concurrent applications.

Implications and Future Work

The authors acknowledge that while the proposed algorithm significantly improves performance, it relaxes some strict FIFO ordering guarantees inherent in other synchronous queue models. This relaxation might necessitate careful consideration in specific application domains where strict ordering is imperative.

Moreover, the algorithm's portability across programming languages such as Go and Rust highlights its utility beyond Kotlin. The presented approach sets a new benchmark for implementing synchronization primitives in modern async programming environments. Future work may focus on further refining memory usage patterns, reusing memory allocations to decrease memory overhead, and extending the current work to support more complex use cases, such as priority messaging or selective message reception.

This work contributes not only in practical terms—offering a performant replacement for existing technologies—but also in extending theoretical and practical understanding of concurrent data structures in asynchronous programming. It enables developers to build more scalable and responsive applications by providing efficient low-level primitives for coroutine-based applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/Sellmair/status/1827343828442837225

https://twitter.com/juanantoniobm/status/1881591766425673740

YouTube

Show All Videos