Papers
Topics
Authors
Recent
Search
2000 character limit reached

Partitioned Probabilistic Neighbour Selection

Updated 23 March 2026
  • Partitioned Probabilistic Neighbour Selection (PPNS) is an approach that divides candidates into disjoint partitions to optimize neighbour selection under privacy and efficiency constraints.
  • The method uses exponential mechanism sampling within partitions to ensure differential privacy and robust resistance against kNN attacks.
  • Empirical results in collaborative filtering and spatial querying show PPNS achieves improved accuracy–privacy trade-offs and faster query times compared to global randomisation.

Partitioned Probabilistic Neighbour Selection (PPNS) describes a class of algorithms for neighbour selection where candidates are divided into disjoint groups (“partitions”) and sampling occurs within or across these partitions according to a scheme designed to balance accuracy, privacy, and computational efficiency. PPNS has been independently developed for two major domains: privacy-preserving collaborative filtering and efficient spatial probabilistic neighbourhood queries. In both cases, the partitioned approach leads to improved trade-offs over prior, globally randomised strategies, with provable guarantees on utility metrics and adversary resilience.

1. Formal Definitions and Core Concepts

In the canonical collaborative filtering setting, given a set of users UU and a target user uau_a, all other users uiu_i are ranked by their similarity sim(ua,ui)\mathrm{sim}(u_a, u_i) to uau_a (e.g., by cosine similarity). The sorted list Sa={u1,u2,,un}S_a' = \{ u_1, u_2, \ldots, u_n \} is then partitioned into β\beta disjoint groups, each of size kk: N1={u1,,uk},  N2={uk+1,,u2k},,Nβ={u(β1)k+1,,uβk}N_1 = \{ u_1, \ldots, u_k \}, \; N_2 = \{ u_{k+1}, \ldots, u_{2k} \}, \ldots, N_\beta = \{ u_{(\beta-1)k+1}, \ldots, u_{\beta k} \} Neighbour selection proceeds by drawing kk users from these blocks, with at least one neighbour required from the lowest-similarity block NβN_\beta.

Two principal metrics quantify the quality and privacy of the selection:

  • Accuracy (α\alpha):

α=E[i=1ksim(ua,neighbouri)]=i=1nsim(ua,ui)μi\alpha = \mathbb{E}\left[ \sum_{i=1}^k \mathrm{sim}(u_a, \mathrm{neighbour}_i) \right ] = \sum_{i=1}^n \mathrm{sim}(u_a, u_i) \mu_i

where μi\mu_i is the expected selection indicator for uiu_i.

  • Security (β\beta): The number of partitions, each of size kk, across which neighbours must be drawn, defining the minimum effective strength required for an attack to succeed.

In the spatial context, with PRdP \subset \mathbb{R}^d and query qRdq \in \mathbb{R}^d, each pPp \in P is included in N(q,f)N(q, f) independently with probability f(dist(p,q))f(\mathrm{dist}(p, q)), for a monotonic function f:R+[0,1]f: \mathbb{R}^+ \to [0,1] (Looz et al., 2015).

2. Algorithmic Procedure: Partitioning and Sampling

The PPNS algorithm exploits the partition structure to optimise the neighbour selection process under privacy or efficiency constraints.

Collaborative Filtering PPNS

  1. Sorting and Partitioning: Candidate users are sorted by similarity and divided into β\beta contiguous partitions of size kk.
  2. Neighbour Allocation: The allocation problem is solved as a linear program under the constraints i=1βfβ(i)=k\sum_{i=1}^\beta f_\beta(i) = k and fβ(β)1f_\beta(\beta) \geq 1. The optimal solution is

fβ(1)=k1, fβ(β)=1, fβ(i)=0 for 1<i<βf_\beta(1) = k-1,\ f_\beta(\beta) = 1,\ f_\beta(i) = 0 \text{ for } 1 < i < \beta

ensuring that k1k-1 neighbours are chosen from the highest-similarity partition, one from the lowest, maximising α\alpha under the enforced security level β\beta (Lu et al., 2015).

  1. Differential Privacy Mechanism: Within each partition, selection proceeds via the exponential mechanism with weights

ωi=exp(ϵ4kRSsim(ua,ui))\omega_i = \exp\left(\frac{\epsilon}{4k \cdot RS} \mathrm{sim}(u_a, u_i) \right)

where RSRS is the recommendation-aware sensitivity of the similarity score. Sampling is performed without replacement proportional to the weights, guaranteeing ϵ\epsilon-differential privacy (Lu et al., 2015, Lu et al., 2015).

Spatial PPNS

  1. Spatial Partitioning: Points are indexed in a balanced polar quadtree (for d=2d=2), with angular and radial splits ensuring cells have equal probability mass under a known or estimated density j(r)j(r).
  2. Query Algorithm: For query qq, the algorithm recursively traverses the quadtree. If the expected number of neighbours in a cell is less than one, the cell is treated as a “virtual leaf” and processed via “jump-sampling” with base probability pmax=f(lowerBoundDist(q,c))p_\mathrm{max} = f(\text{lowerBoundDist}(q, c)).
  3. Efficiency: By partitioning the space into “probability bands” and aggregating small cells, the expected running time per query is reduced to O((N(q,f)+n)logn)O((|N(q, f)| + \sqrt{n})\log n) with high probability (Looz et al., 2015).

3. Theoretical Properties and Guarantees

PPNS offers distinct theoretical advantages:

  • Optimal Accuracy Under Security Constraint: The neighbour allocation (concentrating k1k-1 selections in N1N_1, one in NβN_\beta) is proven to maximise the expected similarity sum α\alpha given the enforced partition span β\beta (Lu et al., 2015).
  • Differential Privacy Assurance: Exponential mechanism sampling within each partition achieves ϵ\epsilon-differential privacy for the neighbour selection procedure.
  • Attack Resistance: By requiring that the final neighbour be drawn from partition β\beta, the cost for a kkNN adversary increases to βk\beta \cdot k required fake profiles. The classic kkNN attack has zero success probability when p(k1)/kp \leq (k-1)/k (geometric partition allocation parameter) (Lu et al., 2015).
  • Query Complexity: In spatial data, the overall query cost is O((N(q,f)+n)logn)O((|N(q, f)|+\sqrt{n})\log n), where N(q,f)|N(q, f)| is the output size; this is sublinear in nn except for very large ff (i.e., dense neighbourhoods) (Looz et al., 2015).

4. Practical Implementation and Complexity

PPNS algorithms follow predictable preprocessing and query steps:

Step Complexity (CF) Complexity (Spatial)
Similarity computation/sorting O(nS)+O(nlogn)O(n |S|) + O(n \log n) N/A
Quadtree construction N/A O(nlogn)O(n \log n) (expected)
Partitioning O(n)O(n) O(n)O(n)
Within-partition sampling O(klogk)O(k \log k) O(N(q,f)+n)lognO(|N(q, f)| + \sqrt{n})\log n

In collaborative filtering, weighted neighbour selection within two partitions of size kk each keeps the per-query cost mild while reducing overall privacy-induced noise, since the DP exponent is not scaled by the total number nn of candidates.

In spatial data queries, the quadtree supports efficient aggregation and “jump-sampling” exploits geometric skipping to avoid examining every point, resulting in practical query times one to two orders of magnitude less than brute-force approaches for millions of points (Looz et al., 2015).

5. Empirical Results and Applications

Collaborative Filtering

Experiments on MovieLens and Douban datasets, using mean absolute error (MAE) as the evaluation metric, confirm the theoretical claims:

  • PPNS’s MAE converges to that of the deterministic kkNN as β1\beta\to 1 (i.e., maximum accuracy).
  • Under kkNN attack conditions, PPNS yields 10–30% lower MAE than naive (global) probabilistic neighbour selection or private neighbour collaborative filtering (Lu et al., 2015).
  • For fixed privacy budget ϵ\epsilon and increasing kk, PPNS exhibits higher accuracy than global-exponential baselines.

Spatial Data

  • Disease-spread simulation on real population data (3–14 million points) achieved PPNS query speeds over 100×\times faster than direct coin-flipping on all points.
  • Random hyperbolic graph generation for nn up to 2202^{20} delivered runtimes >10×>10\times faster than previous O(n2)O(n^2) implementations, matching the theoretical O((n3/2+m)logn)O((n^{3/2} + m)\log n) prediction (Looz et al., 2015).

6. Extensions, Applications, and Limitations

The PPNS framework is adaptable:

  • Generalisation to βk\beta kNN Attacks: Increasing β\beta enforces higher adversary cost. The minimal attack set must span all β\beta blocks, greatly raising the bar compared to classical attacks.
  • Spatial Geometry: The quadtree-based PPNS readily adapts to hyperbolic space, enabling efficient generation of random hyperbolic graphs at arbitrary temperatures.
  • Partition Choice and Performance: When the density j(r)j(r) is unknown, empirical medians can be used for partitioning, which, while sacrificing precise 4i4^{-i} cell-probability guarantees, still achieves strong practical performance.
  • Fine-tuned Trade-off: The geometric mixing parameter pp (in probabilistic block allocation) allows provable, fine-grained control over the accuracy–privacy trade-off (Lu et al., 2015).
  • Applicability: Although primarily evaluated in collaborative filtering and spatial query contexts, the structural principle of partitioned, localised randomisation has potential utility in any setting where controlled trade-offs between utility and privacy or efficiency are required.

7. Comparison to Alternative Approaches

PPNS improves over global randomised neighbour selection in several fundamental respects:

Scheme Privacy Guarantee Accuracy/Utility Complexity
Global Randomised ϵ\epsilon-DP (global) High noise, lower α\alpha O(n)O(n) neighbour pool
PPNS (Partitioned) ϵ\epsilon-DP (local) Provably maximal α\alpha at fixed β\beta O(k)O(k) within-block sampling
Deterministic kkNN None Maximal No privacy

In all examined domains, partitioned selection methods guarantee higher utility for the same security or privacy level, reduce the magnitude of noise induced by DP, and enforce robust lower bounds on adversary success probability dictated by the chosen β\beta parameter.


Partitioned Probabilistic Neighbour Selection is a methodological advancement that enables provable and tunable accuracy–privacy (or accuracy–security) trade-offs by restricting probabilistic selection to suitably structured partitions, rather than applying global randomisation. Its theoretical guarantees and empirical results demonstrate substantial improvements over previous globalised schemes in both privacy-sensitive recommendation and efficient querying of probabilistic spatial neighbourhoods (Looz et al., 2015, Lu et al., 2015, Lu et al., 2015).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Partitioned Probabilistic Neighbour Selection (PPNS).