On Minimizing Data-read and Download for Storage-Node Recovery (1212.6952v2)
Abstract: We consider the problem of efficient recovery of the data stored in any individual node of a distributed storage system, from the rest of the nodes. Applications include handling failures and degraded reads. We measure efficiency in terms of the amount of data-read and the download required. To minimize the download, we focus on the minimum bandwidth setting of the 'regenerating codes' model for distributed storage. Under this model, the system has a total of n nodes, and the data stored in any node must be (efficiently) recoverable from any d of the other (n-1) nodes. Lower bounds on the two metrics under this model were derived previously; it has also been shown that these bounds are achievable for the amount of data-read and download when d=n-1, and for the amount of download alone when d<n-1. In this paper, we complete this picture by proving the converse result, that when d<n-1, these lower bounds are strictly loose with respect to the amount of read required. The proof is information-theoretic, and hence applies to non-linear codes as well. We also show that under two (practical) relaxations of the problem setting, these lower bounds can be met for both read and download simultaneously.