The Capacity of Private Information Retrieval

Published 29 Feb 2016 in cs.IT, cs.CR, cs.IR, and math.IT | (1602.09134v2)

Abstract: In the private information retrieval (PIR) problem a user wishes to retrieve, as efficiently as possible, one out of $K$ messages from $N$ non-communicating databases (each holds all $K$ messages) while revealing nothing about the identity of the desired message index to any individual database. The information theoretic capacity of PIR is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. For $K$ messages and $N$ databases, we show that the PIR capacity is $(1+1/N+1/N^{2+\cdots+1/N^{{K-1})^{-1}$.}} A remarkable feature of the capacity achieving scheme is that if we eliminate any subset of messages (by setting the message symbols to zero), the resulting scheme also achieves the PIR capacity for the remaining subset of messages.

Abstract PDF Upgrade to Chat

Citations (393)

View on Semantic Scholar

Summary

The paper determines the PIR capacity as C = (1 + 1/N + … + 1/N^(K−1))⁻¹, offering a precise performance benchmark.
It introduces a capacity-achieving scheme that exploits database and message symmetry alongside recursive query design.
Results show that adding databases increases capacity while more messages reduce it, guiding efficient privacy-preserving retrieval.

Insights on the Capacity of Private Information Retrieval

The paper entitled "The Capacity of Private Information Retrieval" by Hua Sun and Syed A. Jafar addresses the Private Information Retrieval (PIR) problem from a novel information theoretic perspective. The study establishes the capacity of PIR, a result that complements and advances prior knowledge within computational and communications paradigms concerning data privacy.

Main Contributions

The paper delineates the PIR problem, where a user aims to retrieve one out of $K$ messages from $N$ non-communicating databases efficiently, while preserving the confidentiality of the requested message index from each individual database. The core achievement is an insightful characterization of the PIR capacity for these settings. This is mathematically formulated as: $C = \left(1 + \frac{1}{N} + \frac{1}{N^2} + \cdots + \frac{1}{N^{K-1}}\right)^{-1}$ This expression defines the maximal ratio of bits retrieved to bits downloaded while maintaining the privacy constraint.

Methodological Highlights

The paper introduces a capacity-achieving scheme that leverages three fundamental principles: symmetry across databases, symmetry across messages within database queries, and the exploitation of undesired message side information. This methodical approach enables the successful extraction of desired message bits by pairing them with known side information derived from other databases.

Key to the scheme's construction is a recursive algorithm designed to iteratively construct optimal query sets. This elegant mechanism is akin to techniques in blind interference alignment, translating the interference-canceling capabilities of coherent channels in wireless communications into the field of PIR.

Results and Implications

The PIR capacity is shown to decrease with more messages but increases with additional databases, approaching unity as the number of databases becomes very large. The derived capacity expression outperforms the previously established bound of $1 - \frac{1}{N}$ for large $N$ .

The achievable scheme retains the critical ability to reduce any subset of $K$ messages to zero and still continue operating with optimal capacity on the remaining $K - \Delta$ messages. This modularity provides profound flexibility in handling message elimination scenarios within real-world applications.

Broader Impact and Future Directions

From a practical standpoint, the characterization of PIR capacity harnesses insights from communication theory, particularly through concepts such as blind interference alignment, enriching our understanding of privacy and data retrieval strategies. The foundational concepts extended here have broader ramifications in distributed storage systems, secure cloud computing architectures, and other domains where robust privacy-preserving mechanisms are crucial.

Future exploration emanating from this work could encompass scenarios of limited message sizes or tighter constraints on query uploads, which may require developing generalized or hybrid strategies. Another potential avenue is leveraging coded caching gains within similar data retrieval contexts. This study paves the way for interdisciplinary research that blends notions of data privacy, information theory, and network coding to address multifaceted challenges in digital communication ecosystems.

Markdown