Partitioned Probabilistic Neighbour Selection
- Partitioned Probabilistic Neighbour Selection (PPNS) is an approach that divides candidates into disjoint partitions to optimize neighbour selection under privacy and efficiency constraints.
- The method uses exponential mechanism sampling within partitions to ensure differential privacy and robust resistance against kNN attacks.
- Empirical results in collaborative filtering and spatial querying show PPNS achieves improved accuracy–privacy trade-offs and faster query times compared to global randomisation.
Partitioned Probabilistic Neighbour Selection (PPNS) describes a class of algorithms for neighbour selection where candidates are divided into disjoint groups (“partitions”) and sampling occurs within or across these partitions according to a scheme designed to balance accuracy, privacy, and computational efficiency. PPNS has been independently developed for two major domains: privacy-preserving collaborative filtering and efficient spatial probabilistic neighbourhood queries. In both cases, the partitioned approach leads to improved trade-offs over prior, globally randomised strategies, with provable guarantees on utility metrics and adversary resilience.
1. Formal Definitions and Core Concepts
In the canonical collaborative filtering setting, given a set of users and a target user , all other users are ranked by their similarity to (e.g., by cosine similarity). The sorted list is then partitioned into disjoint groups, each of size : Neighbour selection proceeds by drawing users from these blocks, with at least one neighbour required from the lowest-similarity block .
Two principal metrics quantify the quality and privacy of the selection:
- Accuracy ():
where is the expected selection indicator for .
- Security (): The number of partitions, each of size , across which neighbours must be drawn, defining the minimum effective strength required for an attack to succeed.
In the spatial context, with and query , each is included in independently with probability , for a monotonic function (Looz et al., 2015).
2. Algorithmic Procedure: Partitioning and Sampling
The PPNS algorithm exploits the partition structure to optimise the neighbour selection process under privacy or efficiency constraints.
Collaborative Filtering PPNS
- Sorting and Partitioning: Candidate users are sorted by similarity and divided into contiguous partitions of size .
- Neighbour Allocation: The allocation problem is solved as a linear program under the constraints and . The optimal solution is
ensuring that neighbours are chosen from the highest-similarity partition, one from the lowest, maximising under the enforced security level (Lu et al., 2015).
- Differential Privacy Mechanism: Within each partition, selection proceeds via the exponential mechanism with weights
where is the recommendation-aware sensitivity of the similarity score. Sampling is performed without replacement proportional to the weights, guaranteeing -differential privacy (Lu et al., 2015, Lu et al., 2015).
Spatial PPNS
- Spatial Partitioning: Points are indexed in a balanced polar quadtree (for ), with angular and radial splits ensuring cells have equal probability mass under a known or estimated density .
- Query Algorithm: For query , the algorithm recursively traverses the quadtree. If the expected number of neighbours in a cell is less than one, the cell is treated as a “virtual leaf” and processed via “jump-sampling” with base probability .
- Efficiency: By partitioning the space into “probability bands” and aggregating small cells, the expected running time per query is reduced to with high probability (Looz et al., 2015).
3. Theoretical Properties and Guarantees
PPNS offers distinct theoretical advantages:
- Optimal Accuracy Under Security Constraint: The neighbour allocation (concentrating selections in , one in ) is proven to maximise the expected similarity sum given the enforced partition span (Lu et al., 2015).
- Differential Privacy Assurance: Exponential mechanism sampling within each partition achieves -differential privacy for the neighbour selection procedure.
- Attack Resistance: By requiring that the final neighbour be drawn from partition , the cost for a NN adversary increases to required fake profiles. The classic NN attack has zero success probability when (geometric partition allocation parameter) (Lu et al., 2015).
- Query Complexity: In spatial data, the overall query cost is , where is the output size; this is sublinear in except for very large (i.e., dense neighbourhoods) (Looz et al., 2015).
4. Practical Implementation and Complexity
PPNS algorithms follow predictable preprocessing and query steps:
| Step | Complexity (CF) | Complexity (Spatial) |
|---|---|---|
| Similarity computation/sorting | N/A | |
| Quadtree construction | N/A | (expected) |
| Partitioning | ||
| Within-partition sampling |
In collaborative filtering, weighted neighbour selection within two partitions of size each keeps the per-query cost mild while reducing overall privacy-induced noise, since the DP exponent is not scaled by the total number of candidates.
In spatial data queries, the quadtree supports efficient aggregation and “jump-sampling” exploits geometric skipping to avoid examining every point, resulting in practical query times one to two orders of magnitude less than brute-force approaches for millions of points (Looz et al., 2015).
5. Empirical Results and Applications
Collaborative Filtering
Experiments on MovieLens and Douban datasets, using mean absolute error (MAE) as the evaluation metric, confirm the theoretical claims:
- PPNS’s MAE converges to that of the deterministic NN as (i.e., maximum accuracy).
- Under NN attack conditions, PPNS yields 10–30% lower MAE than naive (global) probabilistic neighbour selection or private neighbour collaborative filtering (Lu et al., 2015).
- For fixed privacy budget and increasing , PPNS exhibits higher accuracy than global-exponential baselines.
Spatial Data
- Disease-spread simulation on real population data (3–14 million points) achieved PPNS query speeds over 100 faster than direct coin-flipping on all points.
- Random hyperbolic graph generation for up to delivered runtimes faster than previous implementations, matching the theoretical prediction (Looz et al., 2015).
6. Extensions, Applications, and Limitations
The PPNS framework is adaptable:
- Generalisation to NN Attacks: Increasing enforces higher adversary cost. The minimal attack set must span all blocks, greatly raising the bar compared to classical attacks.
- Spatial Geometry: The quadtree-based PPNS readily adapts to hyperbolic space, enabling efficient generation of random hyperbolic graphs at arbitrary temperatures.
- Partition Choice and Performance: When the density is unknown, empirical medians can be used for partitioning, which, while sacrificing precise cell-probability guarantees, still achieves strong practical performance.
- Fine-tuned Trade-off: The geometric mixing parameter (in probabilistic block allocation) allows provable, fine-grained control over the accuracy–privacy trade-off (Lu et al., 2015).
- Applicability: Although primarily evaluated in collaborative filtering and spatial query contexts, the structural principle of partitioned, localised randomisation has potential utility in any setting where controlled trade-offs between utility and privacy or efficiency are required.
7. Comparison to Alternative Approaches
PPNS improves over global randomised neighbour selection in several fundamental respects:
| Scheme | Privacy Guarantee | Accuracy/Utility | Complexity |
|---|---|---|---|
| Global Randomised | -DP (global) | High noise, lower | neighbour pool |
| PPNS (Partitioned) | -DP (local) | Provably maximal at fixed | within-block sampling |
| Deterministic NN | None | Maximal | No privacy |
In all examined domains, partitioned selection methods guarantee higher utility for the same security or privacy level, reduce the magnitude of noise induced by DP, and enforce robust lower bounds on adversary success probability dictated by the chosen parameter.
Partitioned Probabilistic Neighbour Selection is a methodological advancement that enables provable and tunable accuracy–privacy (or accuracy–security) trade-offs by restricting probabilistic selection to suitably structured partitions, rather than applying global randomisation. Its theoretical guarantees and empirical results demonstrate substantial improvements over previous globalised schemes in both privacy-sensitive recommendation and efficient querying of probabilistic spatial neighbourhoods (Looz et al., 2015, Lu et al., 2015, Lu et al., 2015).