Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems through Polling-Free and Retry-Free Operation (2401.09359v1)

Published 17 Jan 2024 in cs.AR

Abstract: Extensive polling in shared-memory manycore systems can lead to contention, decreased throughput, and poor energy efficiency. Both lock implementations and the general-purpose atomic operation, load-reserved/store-conditional (LRSC), cause polling due to serialization and retries. To alleviate this overhead, we propose LRwait and SCwait, a synchronization pair that eliminates polling by allowing contending cores to sleep while waiting for previous cores to finish their atomic access. As a scalable implementation of LRwait, we present Colibri, a distributed and scalable approach to managing LRwait reservations. Through extensive benchmarking on an open-source RISC-V platform with 256 cores, we demonstrate that Colibri outperforms current synchronization approaches for various concurrent algorithms with high and low contention regarding throughput, fairness, and energy efficiency. With an area overhead of only 6%, Colibri outperforms LRSC-based implementations by a factor of 6.5x in terms of throughput and 7.1x in terms of energy efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. T. E. Anderson, “The performance of spin lock alternatives for shared-memory multiprocessors,” IEEE Trans. Parallel Distrib. Syst., vol. 1, no. 1, pp. 6–16, 1990.
  2. GreenWaves Technologies SAS, “GAP9 next generation processor for hearables and smart sensors,” GreenWaves Technologies SAS, Tech. Rep., 2021. [Online]. Available: https://greenwaves-technologies.com/wp-content/uploads/2022/06/Product-Brief-GAP9-Sensors-General-V1_14.pdf
  3. R. Ginosar, P. Aviely, T. Israeli, and H. Meirov, “RC64: High performance rad-hard manycore,” in IEEE Aerosp. Conf. Proc.   IEEE, Jun. 2016, pp. 2074–2082.
  4. S. Riedel, M. Cavalcante, R. Andri, and L. Benini, “MemPool: A scalable manycore architecture with a low-latency shared L1 memory,” IEEE Trans. Comput., vol. 72, no. 12, pp. 3561–3575, 2023.
  5. J. M. Mellor-Crummey and M. L. Scott, “Algorithms for scalable synchronization on shared-memory multiprocessors,” ACM Trans. Comput. Syst., vol. 9, no. 1, pp. 21–65, Feb. 1991.
  6. T. B. Strøm, J. Sparsø, and M. Schoeberl, “Hardlock: Real-time multicore locking,” J. Syst. Archit., vol. 97, pp. 467–476, 2019.
  7. F. Glaser, G. Tagliavini, D. Rossi, G. Haugou, Q. Huang, and L. Benini, “Energy-efficient hardware-accelerated synchronization for shared-L1-memory multiprocessor clusters,” IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 3, pp. 633–648, Mar. 2021.
  8. J. L. Abellán, J. Fernández, and M. E. Acacio, “Design of an efficient communication infrastructure for highly contended locks in many-core cmps,” J. Parallel Distrib. Comput., vol. 73, no. 7, pp. 972–985, 2013.
  9. M. Monchiero, G. Palermo, C. Silvano, and O. Villa, “An efficient synchronization technique for multiprocessor systems on-chip,” ACM SIGARCH Comput. Archit. News, vol. 34, no. 1, pp. 33–40, Mar. 2006.
  10. A. Kurth, S. Riedel, F. Zaruba, T. Hoefler, and L. Benini, “ATUNs: Modular and scalable support for atomic operations in a shared memory multiprocessor,” in ACM/IEEE Des. Autom. Conf., vol. 57.   San Francisco, CA, USA: IEEE, Jul. 2020, pp. 902–907.
  11. K. Asanović et al., “The rocket chip generator,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2016-17, Apr. 2016. [Online]. Available: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-17.html
  12. J. Gray, “Implementation of LR/SC on the GRVI multiprocessor,” 2016. [Online]. Available: https://groups.google.com/a/groups.riscv.org/g/hw-dev/c/Mt9Q94f_l2w?pli=1
  13. S. Liu and J. L. Gaudiot, “Synchronization mechanisms on modern multi-core architectures,” in Proc. 12th Asia-Pacific Conf. Adv. Comput. Syst. Archit.   Seoul, Korea: Springer Verlag, 2007, pp. 290–303.

Summary

We haven't generated a summary for this paper yet.