Parallel Integer Sort: Theory and Practice (2401.00710v1)
Abstract: Integer sorting is a fundamental problem in computer science. This paper studies parallel integer sort both in theory and in practice. In theory, we show tighter bounds for a class of existing practical integer sort algorithms, which provides a solid theoretical foundation for their widespread usage in practice and strong performance. In practice, we design a new integer sorting algorithm, \textsf{DovetailSort}, that is theoretically-efficient and has good practical performance. In particular, \textsf{DovetailSort} overcomes a common challenge in existing parallel integer sorting algorithms, which is the difficulty of detecting and taking advantage of duplicate keys. The key insight in \textsf{DovetailSort} is to combine algorithmic ideas from both integer- and comparison-sorting algorithms. In our experiments, \textsf{DovetailSort} achieves competitive or better performance than existing state-of-the-art parallel integer and comparison sorting algorithms on various synthetic and real-world datasets.
- 2010. OpenStreetMap © OpenStreetMap contributors. https://www.openstreetmap.org/.
- Sarita V Adve and Mark D Hill. 1990. Weak ordering—a new definition. In ACM International Symposium on Computer Architecture (ISCA), Vol. 18. 2–14.
- Susanne Albers and Torben Hagerup. 1997. Improved parallel integer sorting without concurrent writing. Information and Computation 136, 1 (1997), 25–51.
- Sorting in Linear Time? J. Computer and System Sciences 57, 1 (1998), 74–93.
- Thread Scheduling for Multiprogrammed Multiprocessors. Theory of Computing Systems (TOCS) 34, 2 (01 Apr 2001).
- Engineering in-place (shared-memory) sorting algorithms. ACM Transactions on Parallel Computing (TOPC) 9, 1 (2022), 1–62.
- Group formation in large social networks: membership, growth, and evolution. In ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 44–54.
- Improved deterministic parallel integer sorting. Information and Computation 94, 1 (1991), 29–47.
- When Are Cache-Oblivious Algorithms Cache Adaptive? A Case Study of Matrix Multiplication and Sorting. In European Symposium on Algorithms (ESA).
- ParlayLib — a toolkit for parallel algorithms on shared-memory multicore machines. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 507–509.
- Guy E Blelloch and Magdalen Dobson. 2022. Parallel Nearest Neighbors in Low Dimensions with Batch Updates. In Algorithm Engineering and Experiments (ALENEX). SIAM, 195–208.
- Just Join for Parallel Ordered Sets. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).
- Internally deterministic parallel algorithms can be fast. In ACM Symposium on Principles and Practice of Parallel Programming (PPOPP). 181–192.
- Optimal parallel algorithms in the binary-forking model. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 89–102.
- Low depth cache-oblivious algorithms. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).
- Parallelism in Randomized Incremental Algorithms. J. ACM 67, 5 (2020), 1–27.
- A Comparison of Sorting Algorithms for the Connection Machine CM-2. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).
- Robert D. Blumofe and Charles E. Leiserson. 1998. Space-Efficient Scheduling of Multithreaded Computations. SIAM J. on Computing 27, 1 (1998).
- PARADIS: an efficient parallel algorithm for in-place radix sort. Proceedings of the VLDB Endowment (PVLDB) 8, 12 (2015), 1518–1529.
- Introduction to Algorithms (3rd edition). MIT Press.
- Theoretically efficient parallel graph algorithms can be fast and scalable. ACM Transactions on Parallel Computing (TOPC) 8, 1 (2021), 1–70.
- DovetailSort: A Parallel Integer Sort Algorithm. https://github.com/ucrparlay/DovetailSort.
- High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).
- Junhao Gan and Yufei Tao. 2017. On the hardness and approximation of Euclidean DBSCAN. ACM Transactions on Database Systems (TODS) 42, 3 (2017), 1–45.
- Memory consistency and event ordering in scalable shared-memory multiprocessors. In ACM International Symposium on Computer Architecture (ISCA), Vol. 18. ACM New York, NY, USA, 15–26.
- Michael T Goodrich and Riko Jacob. 2023. Optimal Parallel Sorting with Comparison Errors. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 355–365.
- David Gries and Harlan Mills. 1981. Swapping sections. Technical Report. Cornell University.
- Ray specialized contraction on bounding volume hierarchies. In Computer Graphics Forum, Vol. 34. 309–318.
- Efficient BVH construction via approximate agglomerative clustering. In High-Performance Graphics (HPG).
- Analysis of Work-Stealing and Parallel Cache Complexity. In SIAM Symposium on Algorithmic Principles of Computer Systems (APOCS). SIAM, 46–60.
- Parallel In-Place Algorithms: Theory and Practice. In SIAM Symposium on Algorithmic Principles of Computer Systems (APOCS). 114–128.
- A Top-Down Parallel Semisort. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 24–34.
- Torben Hagerup. 1991. Constant-time parallel integer sorting. In ACM Symposium on Theory of Computing (STOC). 299–306.
- Hiroshi Inoue and Kenjiro Taura. 2015. SIMD-and cache-friendly algorithm for sorting an array of structures. Proceedings of the VLDB Endowment 8, 11 (2015), 1274–1285.
- ispan: Parallel identification of strongly connected components with spanning trees. In International Conference for High Performance Computing, Networking, Storage, and Analysis (SC). IEEE, 731–742.
- Jyrki Katajainen and Jesper Larsson Träff. 1997. A meticulous analysis of mergesort programs. In Italian Conference on Algorithms and Complexity (ICAC). Springer, 217–228.
- Sorting data on ultra-large scale with RADULS: New incarnation of radix sort. In Beyond Databases, Architectures and Structures (BDAS). Springer, 235–245.
- Even faster sorting of (not only) integers. In Man-Machine Interactions 5: 5th International Conference on Man-Machine Interactions, ICMMI 2017 Held at Kraków, Poland, October 3-6, 2017. Springer, 481–491.
- What is Twitter, a social network or a news media?. In International World Wide Web Conference (WWW). 591–600.
- Scalable clustering algorithm for N-body simulations in a shared-nothing cluster. In International Conference on Scientific and Statistical Database Management. Springer, 132–150.
- Partitioned parallel radix sort. J. Parallel Distrib. Comput. 62, 4 (2002), 656–668.
- k𝑘kitalic_kANN on the GPU with shifted sorting. In High-Performance Graphics (HPG).
- Yossi Matias and Uzi Vishkin. 1991. On parallel hashing and integer sorting. J. Algorithms 12, 4 (1991), 573–606.
- Web Data Commons — Hyperlink Graphs. http://webdatacommons.org/hyperlinkgraph.
- Theoretically-Efficient and Practical Parallel In-Place Radix Sorting. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 213–224.
- Orestis Polychroniou and Kenneth A Ross. 2014. A comprehensive study of main-memory partitioning and its application to large-scale comparison-and radix-sort. In ACM SIGMOD International Conference on Management of Data (SIGMOD). 755–766.
- Sanguthevar Rajasekaran and John H. Reif. 1989. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM J. on Computing 18, 3 (1989), 594–607.
- I/O chunking and latency hiding approach for out-of-core sorting acceleration using GPU and flash NVM. In IEEE International Conference on Big Data (Big Data). 398–403.
- GPU-accelerated large-scale distributed sorting coping with device memory capacity. IEEE Trans. on Big Data 2, 1 (2016), 57–69.
- Xipeng Shen and Chen Ding. 2004. Adaptive data partition for sorting using probability distribution. In International Conference on Parallel Processing (ICPP). IEEE, 250–257.
- BFS and coloring-based parallel algorithms for strongly connected components and related problems. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 550–559.
- Edgar Solomonik and Laxmikant V Kale. 2010. Highly scalable parallel sorting. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1–12.
- On supporting efficient snapshot isolation for hybrid workloads with multi-versioned indexes. Proceedings of the VLDB Endowment (PVLDB) 13, 2 (2019), 211–225.
- Jesper Larsson Träff. 2018. Parallel quicksort without pairwise element exchange. arXiv preprint:1804.07494 (2018).
- Uzi Vishkin. 2010. Thinking in parallel: Some basic data-parallel algorithms and techniques. (2010).
- Parallel Strong Connectivity Based on Faster Reachability. In ACM SIGMOD International Conference on Management of Data (SIGMOD).
- Theoretically-Efficient and Practical Parallel DBSCAN. In ACM SIGMOD International Conference on Management of Data (SIGMOD). 2555–2571.
- GeoGraph: A Framework for Graph Processing on Geometric Data. ACM SIGOPS Operating Systems Review 55, 1 (2021), 38–46.
- Jan Wassenberg and Peter Sanders. 2011. Engineering a multi-core radix sort. In European Conference on Parallel Processing (Euro-Par). Springer, 160–169.
- In-place permuting and perfect shuffling using involutions. Inform. Process. Lett. 113, 10-11 (2013), 386–391.
- Keliang Zhang and Baifeng Wu. 2012. A novel parallel approach of radix sort with bucket partition preprocess. In International Conference on High Performance Computing (HPCC). IEEE, 989–994.
- Learning transportation mode from raw gps data for geographic applications on the web. In International World Wide Web Conference (WWW). 247–256.