ASURA: Scalable and Uniform Data Distribution Algorithm for Storage Clusters

Published 30 Sep 2013 in cs.DC | (1309.7720v2)

Abstract: Large-scale storage cluster systems need to manage a vast amount of data locations. A naive data locations management maintains pairs of data ID and nodes storing the data in tables. However, it is not practical when the number of pairs is too large. To solve this problem, management using data distribution algorithms, rather than management using tables, has been proposed in recent research. It can distribute data by determining the node for storing the data based on the datum ID. Such data distribution algorithms require the ability to handle the addition or removal of nodes, short calculation time and uniform data distribution in the capacity of each node. This paper proposes a data distribution algorithm called ASURA (Advanced Scalable and Uniform storage by Random number Algorithm) that satisfies these requirements. It achieves following four characteristics: 1) minimum data movement to maintain data distribution according to node capacity when nodes are added or removed, even if data are replicated, 2) roughly sub-micro-seconds calculation time, 3) much lower than 1% maximum variability between nodes in data distribution, and 4) data distribution according to the capacity of each node. The evaluation results show that ASURA is qualitatively and quantitatively competitive against major data distribution algorithms such as Consistent Hashing, Weighted Rendezvous Hashing and Random Slicing. The comparison results show benefits of each algorithm; they show that ASURA has advantage in large scale-out storage clusters.