Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

JumpBackHash: Say Goodbye to the Modulo Operation to Distribute Keys Uniformly to Buckets (2403.18682v2)

Published 27 Mar 2024 in cs.DS, cs.DB, and cs.DC

Abstract: The distribution of keys to a given number of buckets is a fundamental task in distributed data processing and storage. A simple, fast, and therefore popular approach is to map the hash values of keys to buckets based on the remainder after dividing by the number of buckets. Unfortunately, these mappings are not stable when the number of buckets changes, which can lead to severe spikes in system resource utilization, such as network or database requests. Consistent hash algorithms can minimize remappings, but are either significantly slower than the modulo-based approach, require floating-point arithmetic, or are based on a family of hash functions rarely available in standard libraries. This paper introduces JumpBackHash, which uses only integer arithmetic and a standard pseudorandom generator. Due to its speed and simple implementation, it can safely replace the modulo-based approach to improve assignment and system stability. A production-ready Java implementation of JumpBackHash has been released as part of the Hash4j open source library.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. [n.d.]. ClickHouse SQL Reference. Retrieved March 25, 2024 from https://clickhouse.com/docs/en/sql-reference/functions/hash-functions#jumpconsistenthash
  2. [n.d.]. Guava: Google Core Libraries for Java. Retrieved March 25, 2024 from https://github.com/google/guava
  3. [n.d.]. Hash4j: Dynatrace hash library for Java. Retrieved March 25, 2024 from https://github.com/dynatrace-oss/hash4j/
  4. [n.d.]. Processor state control for your EC2 instance. Retrieved March 25, 2024 from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html
  5. [n.d.]. RabbitMQ 3.7.8 Release Notes. Retrieved March 25, 2024 from https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.7.8
  6. [n.d.]. VMware Greenplum 6.x Release Notes. Retrieved March 25, 2024 from https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/relnotes-release-notes.html
  7. A. Agarwal. 2020. How Grafana Labs enables horizontally scalable tail sampling in the OpenTelemetry Collector. Retrieved March 25, 2024 from https://grafana.com/blog/2020/06/18/how-grafana-labs-enables-horizontally-scalable-tail-sampling-in-the-opentelemetry-collector/
  8. B. Appleton and M. O’Reilly. 2015. Multi-probe consistent hashing. (2015). arXiv:1505.00062 [cs.DS]
  9. F. Boucault. 2020. Metrictank Data Distribution: The Quest for the Best Hashing Method. Retrieved March 25, 2024 from https://grafana.com/blog/2020/01/06/metrictank-data-distribution-the-quest-for-the-best-hashing-method/
  10. Revisiting consistent hashing with bounded loads. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), Vol. 35. 3976–3983. https://doi.org/10.1609/aaai.v35i5.16517
  11. MementoHash: A Stateful, Minimal Memory, Best Performing Consistent Hash Algorithm. (2023). arXiv:2306.09783 [cs.DC]
  12. Consistently faster: A survey and fair comparison of consistent hashing algorithms. In Proceedings of the 31st Symposium on Advanced Database System (SEBD). 51–64. https://ceur-ws.org/Vol-3478/paper03.pdf
  13. DxHash: A Memory Saving Consistent Hashing Algorithm. ACM Transactions on Internet Technology 24, 3 (2023), 22. https://doi.org/10.1145/3631708
  14. Maglev: A Fast and Reliable Software Network Load Balancer. In proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 523–535. https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/eisenbud
  15. B. Hiltpolt. 2022. Scaling our customer review system for peak traffic. Retrieved March 25, 2024 from https://medium.com/booking-com-development/scaling-our-customer-review-system-for-peak-traffic-cb19be434edf
  16. S. Ioffe. 2010. Improved Consistent Sampling, Weighted Minhash and L1 Sketching. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM). 246–255. https://doi.org/10.1109/ICDM.2010.80
  17. G. K. Kanji. 2006. 100 Statistical Tests (3rd ed.). SAGE Publications Ltd. https://doi.org/10.4135/9781849208499
  18. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the 29th ACM Annual Symposium on Theory of Computing (STOC). 654–663. https://doi.org/10.1145/258533.258660
  19. J. Lamping and E. Veach. 2014. A fast, minimal memory, consistent hash algorithm. (2014). arXiv:1406.2294 [cs.DS]
  20. D. Lemire. [n.d.]. testingRNG : testing popular random-number generators. Retrieved March 25, 2024 from https://github.com/lemire/testingRNG
  21. D. Lemire. 2019. Fast Random Integer Generation in an Interval. ACM Transactions on Modeling and Computer Simulation 29, 1 (2019), 12. https://doi.org/10.1145/3230636
  22. E. Leu. 2023. Fast Consistent Hashing in Constant Time. (2023). arXiv:2307.12448 [cs.DS]
  23. P. L’Ecuyer. [n.d.]. Tables of linear congruential generators of different sizes and good lattice structure. 68, 225 ([n. d.]), 249–260. https://doi.org/10.1090/S0025-5718-99-00996-5
  24. Consistent Weighted Sampling. Technical Report MSR-TR-2010-73. https://www.microsoft.com/en-us/research/publication/consistent-weighted-sampling/
  25. C. Masson and H. K. Lee. 2024. FlipHash: A Constant-Time Consistent Range-Hashing Algorithm. (2024). arXiv:2402.17549 [cs.DS]
  26. J. H. McDonald. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House Publishing, Baltimore, MD. https://www.biostathandbook.com/
  27. AnchorHash: A Scalable Consistent Hash. IEEE/ACM Transactions on Networking 29, 2 (2021), 517–528. https://doi.org/10.1109/TNET.2020.3039547
  28. Consistent hashing with bounded loads. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 587–604. https://dl.acm.org/doi/10.5555/3174304.3175309
  29. Y. Nakatani. 2021. Structured Allocation-Based Consistent Hashing With Improved Balancing for Cloud Infrastructure. IEEE Transactions on Parallel and Distributed Systems 32, 9 (2021), 2248–2261. https://doi.org/10.1109/TPDS.2021.3058963
  30. M. E. O’Neil. 2018. Efficiently Generating a Number in a Range. Retrieved March 25, 2024 from https://www.pcg-random.org/posts/bounded-rands.html
  31. O. Peters. [n.d.]. PolymurHash. Retrieved March 25, 2024 from https://github.com/orlp/polymur-hash
  32. Fast splittable pseudorandom number generators. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages & Applications (2014) (OOPSLA). 453–472. https://doi.org/10.1145/2660193.2660195
  33. D. G. Thaler and C. V. Ravishankar. 1998. Using name-based mappings to increase hit rates. IEEE/ACM Transactions on Networking 6, 1 (1998), 1–14. https://doi.org/10.1109/90.663936
  34. A. Vaneev. [n.d.]. Komihash. Retrieved March 25, 2024 from https://github.com/avaneev/komihash/tree/b27fd681308f92a1fae617b4ecd0981cc69d31a0
  35. A vehicle license plate data access model based on the jump hash consistency algorithm. PLOS ONE 18, 8 (2023), 17. https://doi.org/10.1371/journal.pone.0288427
  36. W. Yi. [n.d.]. Wyhash. Retrieved March 25, 2024 from https://github.com/wangyi-fudan/wyhash
  37. AreaHash: A Balanced and fully scalable consistency hashing algorithm. In Proceedings of the 24th IEEE International Conference on High Performance Computing and Communications (HPCC). 857–862. https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00139

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com