Partitioned Probabilistic Neighbour Selection

Updated 23 March 2026

Partitioned Probabilistic Neighbour Selection (PPNS) is an approach that divides candidates into disjoint partitions to optimize neighbour selection under privacy and efficiency constraints.
The method uses exponential mechanism sampling within partitions to ensure differential privacy and robust resistance against kNN attacks.
Empirical results in collaborative filtering and spatial querying show PPNS achieves improved accuracy–privacy trade-offs and faster query times compared to global randomisation.

Partitioned Probabilistic Neighbour Selection (PPNS) describes a class of algorithms for neighbour selection where candidates are divided into disjoint groups (“partitions”) and sampling occurs within or across these partitions according to a scheme designed to balance accuracy, privacy, and computational efficiency. PPNS has been independently developed for two major domains: privacy-preserving collaborative filtering and efficient spatial probabilistic neighbourhood queries. In both cases, the partitioned approach leads to improved trade-offs over prior, globally randomised strategies, with provable guarantees on utility metrics and adversary resilience.

1. Formal Definitions and Core Concepts

In the canonical collaborative filtering setting, given a set of users $U$ and a target user $u_a$ , all other users $u_i$ are ranked by their similarity $\mathrm{sim}(u_a, u_i)$ to $u_a$ (e.g., by cosine similarity). The sorted list $S_a' = \{ u_1, u_2, \ldots, u_n \}$ is then partitioned into $\beta$ disjoint groups, each of size $k$ : $N_1 = \{ u_1, \ldots, u_k \}, \; N_2 = \{ u_{k+1}, \ldots, u_{2k} \}, \ldots, N_\beta = \{ u_{(\beta-1)k+1}, \ldots, u_{\beta k} \}$ Neighbour selection proceeds by drawing $k$ users from these blocks, with at least one neighbour required from the lowest-similarity block $N_\beta$ .

Two principal metrics quantify the quality and privacy of the selection:

Accuracy ( $\alpha$ ):

$\alpha = \mathbb{E}\left[ \sum_{i=1}^k \mathrm{sim}(u_a, \mathrm{neighbour}_i) \right ] = \sum_{i=1}^n \mathrm{sim}(u_a, u_i) \mu_i$

where $\mu_i$ is the expected selection indicator for $u_i$ .

Security ( $\beta$ ): The number of partitions, each of size $k$ , across which neighbours must be drawn, defining the minimum effective strength required for an attack to succeed.

In the spatial context, with $P \subset \mathbb{R}^d$ and query $q \in \mathbb{R}^d$ , each $p \in P$ is included in $N(q, f)$ independently with probability $f(\mathrm{dist}(p, q))$ , for a monotonic function $f: \mathbb{R}^+ \to [0,1]$ (Looz et al., 2015).

2. Algorithmic Procedure: Partitioning and Sampling

The PPNS algorithm exploits the partition structure to optimise the neighbour selection process under privacy or efficiency constraints.

Collaborative Filtering PPNS

Sorting and Partitioning: Candidate users are sorted by similarity and divided into $\beta$ contiguous partitions of size $k$ .
Neighbour Allocation: The allocation problem is solved as a linear program under the constraints $\sum_{i=1}^\beta f_\beta(i) = k$ and $f_\beta(\beta) \geq 1$ . The optimal solution is

$f_\beta(1) = k-1,\ f_\beta(\beta) = 1,\ f_\beta(i) = 0 \text{ for } 1 < i < \beta$

ensuring that $k-1$ neighbours are chosen from the highest-similarity partition, one from the lowest, maximising $\alpha$ under the enforced security level $\beta$ (Lu et al., 2015).

Differential Privacy Mechanism: Within each partition, selection proceeds via the exponential mechanism with weights

$\omega_i = \exp\left(\frac{\epsilon}{4k \cdot RS} \mathrm{sim}(u_a, u_i) \right)$

where $RS$ is the recommendation-aware sensitivity of the similarity score. Sampling is performed without replacement proportional to the weights, guaranteeing $\epsilon$ -differential privacy (Lu et al., 2015, Lu et al., 2015).

Spatial PPNS

Spatial Partitioning: Points are indexed in a balanced polar quadtree (for $d=2$ ), with angular and radial splits ensuring cells have equal probability mass under a known or estimated density $j(r)$ .
Query Algorithm: For query $q$ , the algorithm recursively traverses the quadtree. If the expected number of neighbours in a cell is less than one, the cell is treated as a “virtual leaf” and processed via “jump-sampling” with base probability $p_\mathrm{max} = f(\text{lowerBoundDist}(q, c))$ .
Efficiency: By partitioning the space into “probability bands” and aggregating small cells, the expected running time per query is reduced to $O((|N(q, f)| + \sqrt{n})\log n)$ with high probability (Looz et al., 2015).

3. Theoretical Properties and Guarantees

PPNS offers distinct theoretical advantages:

Optimal Accuracy Under Security Constraint: The neighbour allocation (concentrating $k-1$ selections in $N_1$ , one in $N_\beta$ ) is proven to maximise the expected similarity sum $\alpha$ given the enforced partition span $\beta$ (Lu et al., 2015).
Differential Privacy Assurance: Exponential mechanism sampling within each partition achieves $\epsilon$ -differential privacy for the neighbour selection procedure.
Attack Resistance: By requiring that the final neighbour be drawn from partition $\beta$ , the cost for a $k$ NN adversary increases to $\beta \cdot k$ required fake profiles. The classic $k$ NN attack has zero success probability when $p \leq (k-1)/k$ (geometric partition allocation parameter) (Lu et al., 2015).
Query Complexity: In spatial data, the overall query cost is $O((|N(q, f)|+\sqrt{n})\log n)$ , where $|N(q, f)|$ is the output size; this is sublinear in $n$ except for very large $f$ (i.e., dense neighbourhoods) (Looz et al., 2015).

4. Practical Implementation and Complexity

PPNS algorithms follow predictable preprocessing and query steps:

Step	Complexity (CF)	Complexity (Spatial)
Similarity computation/sorting	$O(n \|S\|) + O(n \log n)$	N/A
Quadtree construction	N/A	$O(n \log n)$ (expected)
Partitioning	$O(n)$	$O(n)$
Within-partition sampling	$O(k \log k)$	$O(\|N(q, f)\| + \sqrt{n})\log n$

In collaborative filtering, weighted neighbour selection within two partitions of size $k$ each keeps the per-query cost mild while reducing overall privacy-induced noise, since the DP exponent is not scaled by the total number $n$ of candidates.

In spatial data queries, the quadtree supports efficient aggregation and “jump-sampling” exploits geometric skipping to avoid examining every point, resulting in practical query times one to two orders of magnitude less than brute-force approaches for millions of points (Looz et al., 2015).

5. Empirical Results and Applications

Collaborative Filtering

Experiments on MovieLens and Douban datasets, using mean absolute error (MAE) as the evaluation metric, confirm the theoretical claims:

PPNS’s MAE converges to that of the deterministic $k$ NN as $\beta\to 1$ (i.e., maximum accuracy).
Under $k$ NN attack conditions, PPNS yields 10–30% lower MAE than naive (global) probabilistic neighbour selection or private neighbour collaborative filtering (Lu et al., 2015).
For fixed privacy budget $\epsilon$ and increasing $k$ , PPNS exhibits higher accuracy than global-exponential baselines.

Spatial Data

Disease-spread simulation on real population data (3–14 million points) achieved PPNS query speeds over 100 $\times$ faster than direct coin-flipping on all points.
Random hyperbolic graph generation for $n$ up to $2^{20}$ delivered runtimes $>10\times$ faster than previous $O(n^2)$ implementations, matching the theoretical $O((n^{3/2} + m)\log n)$ prediction (Looz et al., 2015).

6. Extensions, Applications, and Limitations

The PPNS framework is adaptable:

Generalisation to $\beta k$ NN Attacks: Increasing $\beta$ enforces higher adversary cost. The minimal attack set must span all $\beta$ blocks, greatly raising the bar compared to classical attacks.
Spatial Geometry: The quadtree-based PPNS readily adapts to hyperbolic space, enabling efficient generation of random hyperbolic graphs at arbitrary temperatures.
Partition Choice and Performance: When the density $j(r)$ is unknown, empirical medians can be used for partitioning, which, while sacrificing precise $4^{-i}$ cell-probability guarantees, still achieves strong practical performance.
Fine-tuned Trade-off: The geometric mixing parameter $p$ (in probabilistic block allocation) allows provable, fine-grained control over the accuracy–privacy trade-off (Lu et al., 2015).
Applicability: Although primarily evaluated in collaborative filtering and spatial query contexts, the structural principle of partitioned, localised randomisation has potential utility in any setting where controlled trade-offs between utility and privacy or efficiency are required.

7. Comparison to Alternative Approaches

PPNS improves over global randomised neighbour selection in several fundamental respects:

Scheme	Privacy Guarantee	Accuracy/Utility	Complexity
Global Randomised	$\epsilon$ -DP (global)	High noise, lower $\alpha$	$O(n)$ neighbour pool
PPNS (Partitioned)	$\epsilon$ -DP (local)	Provably maximal $\alpha$ at fixed $\beta$	$O(k)$ within-block sampling
Deterministic $k$ NN	None	Maximal	No privacy

In all examined domains, partitioned selection methods guarantee higher utility for the same security or privacy level, reduce the magnitude of noise induced by DP, and enforce robust lower bounds on adversary success probability dictated by the chosen $\beta$ parameter.

Partitioned Probabilistic Neighbour Selection is a methodological advancement that enables provable and tunable accuracy–privacy (or accuracy–security) trade-offs by restricting probabilistic selection to suitably structured partitions, rather than applying global randomisation. Its theoretical guarantees and empirical results demonstrate substantial improvements over previous globalised schemes in both privacy-sensitive recommendation and efficient querying of probabilistic spatial neighbourhoods (Looz et al., 2015, Lu et al., 2015, Lu et al., 2015).

Markdown Report Issue Upgrade to Chat

References (3)

Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently (2015)

A Security-assured Accuracy-maximised Privacy Preserving Collaborative Filtering Recommendation Algorithm (2015)

An Accuracy-Assured Privacy-Preserving Recommender System for Internet Commerce (2015)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Partitioned Probabilistic Neighbour Selection (PPNS).

Partitioned Probabilistic Neighbour Selection

1. Formal Definitions and Core Concepts

2. Algorithmic Procedure: Partitioning and Sampling

Collaborative Filtering PPNS

Spatial PPNS

3. Theoretical Properties and Guarantees

4. Practical Implementation and Complexity

5. Empirical Results and Applications

Collaborative Filtering

Spatial Data

6. Extensions, Applications, and Limitations

7. Comparison to Alternative Approaches

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Partitioned Probabilistic Neighbour Selection

1. Formal Definitions and Core Concepts

2. Algorithmic Procedure: Partitioning and Sampling

Collaborative Filtering PPNS

Spatial PPNS

3. Theoretical Properties and Guarantees

4. Practical Implementation and Complexity

5. Empirical Results and Applications

Collaborative Filtering

Spatial Data

6. Extensions, Applications, and Limitations

7. Comparison to Alternative Approaches

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research