JumpBackHash: Say Goodbye to the Modulo Operation to Distribute Keys Uniformly to Buckets (2403.18682v2)
Abstract: The distribution of keys to a given number of buckets is a fundamental task in distributed data processing and storage. A simple, fast, and therefore popular approach is to map the hash values of keys to buckets based on the remainder after dividing by the number of buckets. Unfortunately, these mappings are not stable when the number of buckets changes, which can lead to severe spikes in system resource utilization, such as network or database requests. Consistent hash algorithms can minimize remappings, but are either significantly slower than the modulo-based approach, require floating-point arithmetic, or are based on a family of hash functions rarely available in standard libraries. This paper introduces JumpBackHash, which uses only integer arithmetic and a standard pseudorandom generator. Due to its speed and simple implementation, it can safely replace the modulo-based approach to improve assignment and system stability. A production-ready Java implementation of JumpBackHash has been released as part of the Hash4j open source library.
- [n.d.]. ClickHouse SQL Reference. Retrieved March 25, 2024 from https://clickhouse.com/docs/en/sql-reference/functions/hash-functions#jumpconsistenthash
- [n.d.]. Guava: Google Core Libraries for Java. Retrieved March 25, 2024 from https://github.com/google/guava
- [n.d.]. Hash4j: Dynatrace hash library for Java. Retrieved March 25, 2024 from https://github.com/dynatrace-oss/hash4j/
- [n.d.]. Processor state control for your EC2 instance. Retrieved March 25, 2024 from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html
- [n.d.]. RabbitMQ 3.7.8 Release Notes. Retrieved March 25, 2024 from https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.7.8
- [n.d.]. VMware Greenplum 6.x Release Notes. Retrieved March 25, 2024 from https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/relnotes-release-notes.html
- A. Agarwal. 2020. How Grafana Labs enables horizontally scalable tail sampling in the OpenTelemetry Collector. Retrieved March 25, 2024 from https://grafana.com/blog/2020/06/18/how-grafana-labs-enables-horizontally-scalable-tail-sampling-in-the-opentelemetry-collector/
- B. Appleton and M. O’Reilly. 2015. Multi-probe consistent hashing. (2015). arXiv:1505.00062 [cs.DS]
- F. Boucault. 2020. Metrictank Data Distribution: The Quest for the Best Hashing Method. Retrieved March 25, 2024 from https://grafana.com/blog/2020/01/06/metrictank-data-distribution-the-quest-for-the-best-hashing-method/
- Revisiting consistent hashing with bounded loads. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), Vol. 35. 3976–3983. https://doi.org/10.1609/aaai.v35i5.16517
- MementoHash: A Stateful, Minimal Memory, Best Performing Consistent Hash Algorithm. (2023). arXiv:2306.09783 [cs.DC]
- Consistently faster: A survey and fair comparison of consistent hashing algorithms. In Proceedings of the 31st Symposium on Advanced Database System (SEBD). 51–64. https://ceur-ws.org/Vol-3478/paper03.pdf
- DxHash: A Memory Saving Consistent Hashing Algorithm. ACM Transactions on Internet Technology 24, 3 (2023), 22. https://doi.org/10.1145/3631708
- Maglev: A Fast and Reliable Software Network Load Balancer. In proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 523–535. https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/eisenbud
- B. Hiltpolt. 2022. Scaling our customer review system for peak traffic. Retrieved March 25, 2024 from https://medium.com/booking-com-development/scaling-our-customer-review-system-for-peak-traffic-cb19be434edf
- S. Ioffe. 2010. Improved Consistent Sampling, Weighted Minhash and L1 Sketching. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM). 246–255. https://doi.org/10.1109/ICDM.2010.80
- G. K. Kanji. 2006. 100 Statistical Tests (3rd ed.). SAGE Publications Ltd. https://doi.org/10.4135/9781849208499
- Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the 29th ACM Annual Symposium on Theory of Computing (STOC). 654–663. https://doi.org/10.1145/258533.258660
- J. Lamping and E. Veach. 2014. A fast, minimal memory, consistent hash algorithm. (2014). arXiv:1406.2294 [cs.DS]
- D. Lemire. [n.d.]. testingRNG : testing popular random-number generators. Retrieved March 25, 2024 from https://github.com/lemire/testingRNG
- D. Lemire. 2019. Fast Random Integer Generation in an Interval. ACM Transactions on Modeling and Computer Simulation 29, 1 (2019), 12. https://doi.org/10.1145/3230636
- E. Leu. 2023. Fast Consistent Hashing in Constant Time. (2023). arXiv:2307.12448 [cs.DS]
- P. L’Ecuyer. [n.d.]. Tables of linear congruential generators of different sizes and good lattice structure. 68, 225 ([n. d.]), 249–260. https://doi.org/10.1090/S0025-5718-99-00996-5
- Consistent Weighted Sampling. Technical Report MSR-TR-2010-73. https://www.microsoft.com/en-us/research/publication/consistent-weighted-sampling/
- C. Masson and H. K. Lee. 2024. FlipHash: A Constant-Time Consistent Range-Hashing Algorithm. (2024). arXiv:2402.17549 [cs.DS]
- J. H. McDonald. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House Publishing, Baltimore, MD. https://www.biostathandbook.com/
- AnchorHash: A Scalable Consistent Hash. IEEE/ACM Transactions on Networking 29, 2 (2021), 517–528. https://doi.org/10.1109/TNET.2020.3039547
- Consistent hashing with bounded loads. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 587–604. https://dl.acm.org/doi/10.5555/3174304.3175309
- Y. Nakatani. 2021. Structured Allocation-Based Consistent Hashing With Improved Balancing for Cloud Infrastructure. IEEE Transactions on Parallel and Distributed Systems 32, 9 (2021), 2248–2261. https://doi.org/10.1109/TPDS.2021.3058963
- M. E. O’Neil. 2018. Efficiently Generating a Number in a Range. Retrieved March 25, 2024 from https://www.pcg-random.org/posts/bounded-rands.html
- O. Peters. [n.d.]. PolymurHash. Retrieved March 25, 2024 from https://github.com/orlp/polymur-hash
- Fast splittable pseudorandom number generators. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages & Applications (2014) (OOPSLA). 453–472. https://doi.org/10.1145/2660193.2660195
- D. G. Thaler and C. V. Ravishankar. 1998. Using name-based mappings to increase hit rates. IEEE/ACM Transactions on Networking 6, 1 (1998), 1–14. https://doi.org/10.1109/90.663936
- A. Vaneev. [n.d.]. Komihash. Retrieved March 25, 2024 from https://github.com/avaneev/komihash/tree/b27fd681308f92a1fae617b4ecd0981cc69d31a0
- A vehicle license plate data access model based on the jump hash consistency algorithm. PLOS ONE 18, 8 (2023), 17. https://doi.org/10.1371/journal.pone.0288427
- W. Yi. [n.d.]. Wyhash. Retrieved March 25, 2024 from https://github.com/wangyi-fudan/wyhash
- AreaHash: A Balanced and fully scalable consistency hashing algorithm. In Proceedings of the 24th IEEE International Conference on High Performance Computing and Communications (HPCC). 857–862. https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00139