FAST CLOUD: Pushing the Envelope on Delay Performance of Cloud Storage with Coding (1301.1294v2)
Abstract: Our paper presents solutions that can significantly improve the delay performance of putting and retrieving data in and out of cloud storage. We first focus on measuring the delay performance of a very popular cloud storage service Amazon S3. We establish that there is significant randomness in service times for reading and writing small and medium size objects when assigned distinct keys. We further demonstrate that using erasure coding, parallel connections to storage cloud and limited chunking (i.e., dividing the object into a few smaller objects) together pushes the envelope on service time distributions significantly (e.g., 76%, 80%, and 85% reductions in mean, 90th, and 99th percentiles for 2 Mbyte files) at the expense of additional storage (e.g., 1.75x). However, chunking and erasure coding increase the load and hence the queuing delays while reducing the supportable rate region in number of requests per second per node. Thus, in the second part of our paper we focus on analyzing the delay performance when chunking, FEC, and parallel connections are used together. Based on this analysis, we develop load adaptive algorithms that can pick the best code rate on a per request basis by using off-line computed queue backlog thresholds. The solutions work with homogeneous services with fixed object sizes, chunk sizes, operation type (e.g., read or write) as well as heterogeneous services with mixture of object sizes, chunk sizes, and operation types. We also present a simple greedy solution that opportunistically uses idle connections and picks the erasure coding rate accordingly on the fly. Both backlog and greedy solutions support the full rate region and provide best mean delay performance when compared to the best fixed coding rate policy. Our evaluations show that backlog based solutions achieve better delay performance at higher percentile values than the greedy solution.