Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Distributed Data Structures for Future Many-core Architectures (2404.05515v1)

Published 8 Apr 2024 in cs.DC and cs.DS

Abstract: We study general techniques for implementing distributed data structures on top of future many-core architectures with non cache-coherent or partially cache-coherent memory. With the goal of contributing towards what might become, in the future, the concurrency utilities package in Java collections for such architectures, we end up with a comprehensive collection of data structures by considering different variants of these techniques. To achieve scalability, we study a generic scheme which makes all our implementations hierarchical. We consider a collection of known techniques for improving the scalability of concurrent data structures and we adjust them to work in our setting. We have performed experiments which illustrate that some of these techniques have indeed high impact on achieving scalability. Our experiments also reveal the performance and scalability power of the hierarchical approach. We finally present experiments to study energy consumption aspects of the proposed techniques by using an energy model recently proposed for such architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Scale-out numa. In ASPLOS, pages 3–18. ACM, 2014.
  2. Runnemede: An architecture for Ubiquitous High-Performance Computing. In HPCA, pages 198–209, 2013.
  3. Scc: A flexible architecture for many-core platform research. Computing in Science Engineering, 13(6):79–83, Nov 2011.
  4. Formic: Cost-efficient and scalable prototyping of manycore architectures. In FCCM, 2012.
  5. Hera-JVM: a runtime system for heterogeneous multi-core architectures. In OOPSLA, pages 205–222, 2010.
  6. Java/DSM: A platform for heterogeneous computing. Concurrency: Practice and Experience, 9(11):1213–1224, 1997.
  7. Jessica2: A distributed java virtual machine with transparent thread migration support. In IEEE Cluster, pages 381–388, 2002.
  8. Douglas Lea. Concurrent Programming in Java(TM): Design Principles and Patterns (3rd Edition). Addison-Wesley Professional, 2006.
  9. Oracle. Java utilities library.
  10. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In PODC, pages 267–275, NY, USA, 1996.
  11. William Pugh. Skip lists: A probabilistic alternative to balanced trees. Commun. ACM, 33(6):668–676, June 1990.
  12. Scalable synchronous queues. In PPoPP, New York, US, Mar 2006.
  13. Scalability of parallel machines. Commun. ACM, 34(3):57–61, March 1991.
  14. Lock cohorting: A general technique for designing numa locks. In PPoPP, pages 247–256, 2012.
  15. Revisiting the combining synchronization technique. In SPAA, pages 257–266, 2012.
  16. A scalable lock-free stack algorithm. In SPAA, pages 206–215. ACM, 2004.
  17. Flat combining and the synchronization-parallelism tradeoff. In SPAA, pages 355–364, 2010.
  18. Robert Devine. Design and Implementation of DDH: A Distributed Dynamic Hashing Algorithm. In FODO, pages 101–114, 1993.
  19. Hazelcast. The leading in-memory data grid. http://hazelcast.com/.
  20. Omid Shahmirzadi. High-Performance Communication Primitives and Data Structures on Message-Passing Manycores. PhD thesis, EPFL, 2014. n. 6328.
  21. Supporting increment and decrement operations in balancing networks. In STACS 99, volume 1563 of Lecture Notes in Computer Science, pages 393–403. Springer, 1999.
  22. Counting networks. J. ACM, 41:5:1020–5:1048, September 1994.
  23. Software transactional memory for large scale clusters. In PPoPP, pages 247–258, 2008.
  24. M. Couceiro et al. D2STM: Dependable Distributed Software Transactional Memory. In PRDC, 2009.
  25. On reducing false conflicts in distributed transactional data structures. In ICDCN, pages 8:1–8:10, January 2015.
  26. TM2C: A Software Transactional Memory for Many-cores. In EuroSys, pages 351–364, 2012.
  27. DiSTM: A Software Transactional Memory Framework for Clusters. In ICPP, pages 51–58. IEEE Computer Society, 2008.
  28. Exploiting distributed version concurrency in a transactional memory cluster. In PPoPP, pages 198–208. ACM, 2006.
  29. Supporting STM in Distributed Systems: Mechanisms and a Java Framework. In TRANSACT, 2011.
  30. HyFlow: A High Performance Distributed Software Transactional Memory Framework. In HPDC, pages 265–266, 2011.
  31. Transactional memory: architectural support for lock-free data structures. In ISCA, 1993.
  32. Software transactional memory. In PODC, pages 204–213. ACM, 1995.
  33. Distributed Computing: Fundamentals, Simulations and Advanced Topics (2nd edition). John Wiley Interscience, March 2004.
  34. Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7), 1978.
  35. Transactional forwarding algorithm. Technical report, Virginia Tech, 2011.
  36. Snake: Control Flow Distributed Software Transactional Memory. In SSS, pages 238–252, 2011.
  37. Consistency in hindsight: A fully decentralized STM algorithm. In IPDPS, pages 1–12, April 2010.
  38. A provably starvation-free distributed directory protocol. In SSS, pages 405–419, September 2010.
  39. Directory protocols for distributed transactional memory. In Transactional Memory. Foundations, Algorithms, Tools, and Applications, volume 8913, pages 367–391. Springer, 2015.
  40. Maurice Herlihy and Ye Sun. Distributed transactional memory for metric-space networks. In DISC, pages 324–338, 2005.
  41. Distributed transactional memory for general networks. Distrib. Comput., 27(5):329–362, October 2014.
  42. Bo Zhang and Binoy Ravindran. Brief announcement: Relay: A cache-coherence protocol for distributed transactional memory. In OPODIS, 2009.
  43. The arrow distributed directory protocol. In DISC, volume 1499 of Lecture Notes in Computer Science, pages 119–133. Springer, 1998.
  44. Milind Kulkarni et al. How much parallelism is there in irregular applications? In PPoPP, pages 3–14, 2009.
  45. Milind Kulkarni et al. Optimistic parallelism benefits from data partitioning. In ASPLOS, 2008.
  46. D.B. Larkins et al. Global trees: A framework for linked data structures on distributed memory parallel systems. In SC, pages 1–13, Nov 2008.
  47. Distributing a search tree among a growing number of processors. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pages 265–276, New York, USA, 1994.
  48. Skip Graphs. In SODA, pages 384–393, Philadelphia, USA, 2003. SIAM.
  49. Steven D. Gribble et al. Scalable, distributed data structures for internet service construction. In OSDI, pages 22–22, 2000.
  50. Eh* - extendible hashing in a distributed environment. In COMPSAC, 1997.
  51. Using distributed data structures for constructing cluster-based services. In EASY, 2001.
  52. A practical scalable distributed B-tree. PVLDB, 1(1):598–609, 2008.
  53. GridGain. Gridgain - in-memory data fabric. http://www.gridgain.com/.
  54. Latency-tolerant software distributed shared memory. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC ’15, pages 291–305, Berkeley, CA, USA, 2015. USENIX Association.
  55. Flat-combining NUMA locks. In SPAA, pages 65–74, June 2011.
  56. Asynchronized concurrency: The secret to scaling concurrent search data structures. In ASPLOS, pages 631–644, March 2015.
  57. Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1996.
  58. Linearizability: A correctness condition for concurrent objects. TOPLAS, 12(3):463–492, 1990.
  59. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008.
  60. Algorithms for scalable synchronization on shared-memory multiprocessors. TOCS, 9(1):21–65, 1991.
  61. Allan Gottlieb et al. The NYU Ultracomputer designing a MIMD, shared-memory parallel machine. In ACM SIGARCH Computer Architecture News, volume 10, pages 27–42. IEEE Computer Society Press, 1982.
  62. Travis S. Craig. Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, Department of Computer Science, University of Washington, February 1993.
  63. Queue Locks on Cache Coherent Multiprocessors. In Proceedings of the 8th International Symposium on Parallel Processing (IPDPS), pages 165–171, 1994.
  64. A highly-efficient wait-free universal construction. In Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures, pages 325–334, NY, USA, 2011.
  65. Bernard Mans. Portable distributed priority queues with MPI. Concurrency: Practice and Experience, 10(3):175–198, 1998.
Citations (2)

Summary

We haven't generated a summary for this paper yet.