Private Information Retrieval from MDS Coded Data in Distributed Storage Systems (1602.01458v4)

Published 3 Feb 2016 in cs.IT and math.IT

Abstract: The problem of providing privacy, in the private information retrieval (PIR) sense, to users requesting data from a distributed storage system (DSS), is considered. The DSS is coded by an $(n,k,d)$ Maximum Distance Separable (MDS) code to store the data reliably on unreliable storage nodes. Some of these nodes can be spies which report to a third party, such as an oppressive regime, which data is being requested by the user. An information theoretic PIR scheme ensures that a user can satisfy its request while revealing, to the spy nodes, no information on which data is being requested. A user can trivially achieve PIR by downloading all the data in the DSS. However, this is not a feasible solution due to its high communication cost. We construct PIR schemes with low download communication cost. When there is $b=1$ spy node in the DSS, we construct PIR schemes with download cost $\frac{1}{1-R}$ per unit of requested data ($R=k/n$ is the code rate), achieving the information theoretic limit for linear schemes. The proposed schemes are universal since they depend on the code rate, but not on the generator matrix of the code. Also, when $b\leq n-\delta k$, for some $\delta \in \mathbb{N^+}$, we construct linear PIR schemes with $cPoP = \frac{b+\delta k}{\delta}$.

Citations (217)

View on Semantic Scholar

Summary

The paper designs a linear PIR scheme that achieves the theoretical lower bound on download cost for non-colluding MDS-coded storage systems.
It extends the construction to scenarios with up to d-1 colluding nodes, ensuring information theoretic privacy with a low communication price.
The approach is universal and efficient, allowing flexible deployment without a joint design of the coding and PIR schemes.

Overview of Private Information Retrieval from MDS Coded Data in Distributed Storage Systems

The paper "Private Information Retrieval from MDS Coded Data in Distributed Storage Systems" addresses the challenge of ensuring privacy in distributed storage systems (DSS) that use Maximum Distance Separable (MDS) codes. Specifically, it explores the development of Private Information Retrieval (PIR) schemes that allow users to retrieve data from a DSS without revealing which data item is being requested, even when some nodes may collude to uncover this information.

The authors tackle this problem under the assumption that the DSS is composed of storage nodes, some of which may act as spies. These nodes can collude and compromise user privacy. Traditional PIR schemes necessitate downloading all data for privacy, incurring high communication costs. This paper proposes constructions that achieve PIR with minimal download cost.

PIR Scheme for $b=1$ : Non-Colluding Nodes

The paper first considers the scenario where there are no colluding nodes ( $b=1$ ). In this case, the authors design a linear PIR scheme that achieves the theoretical lower bound on the download communication cost for linear schemes. The scheme is universal in that it depends only on the code rate ( $R=k/n$ ) and not on the specific MDS code used. The communication price of privacy (cPoP), defined as the download cost per unit requested data, is $\frac{1}{1-R}$ , which matches the asymptotic lower bound for PIR on coded data as the number of files $m$ approaches infinity.

PIR Scheme for Colluding Nodes

The paper extends the PIR construction to scenarios where collusion among up to $d-1$ nodes is possible. For $2 \leq b \leq d-1$ , the proposed PIR schemes achieve a cPoP of $b+k$ . These schemes ensure that information theoretic privacy is maintained even when up to $d-1$ nodes collaborate to uncover the requested data item. Furthermore, these results are generalized to any number $b$ of colluding nodes up to $n-\delta k$ , where $\delta=\floor{\frac{n-b}{k}}$. The proposed PIR schemes for this scenario have a cPoP of $\frac{b + \delta k}{\delta}$ .

The suggested schemes are efficient and their cPoP does not depend on the number of files $m$ . Additionally, they do not require a joint design of the coding scheme and the PIR scheme, offering flexibility in the deployment of existing coded data in DSS applications.

Practical and Theoretical Implications

Practically, the construction of these PIR schemes provides a feasible approach for private data retrieval in peer-to-peer DSS, where users may face surveillance or monitoring threats. Theoretically, these constructions contribute to an ongoing dialogue about the bounds and possibilities for PIR on coded data, particularly in scenarios where collusion is a significant risk.

Future Directions

The structure of the proposed PIR schemes opens several avenues for future research. One potential direction is further minimizing the cPoP in scenarios with colluding nodes, approaching the exact theoretical bounds or possibly finding improved constructions. Another area of interest is the extension of these ideas to more complex storage models, including those with heterogeneously reliable nodes or more sophisticated threat models, such as adversarial spies that may manipulate stored data.

In conclusion, this paper makes significant contributions to the field of private information retrieval from coded data by proposing efficient and theoretically grounded schemes that address both practical deployment strategies and theoretical limits.