Whether overlapping shards strictly improve results under fixed shard-size constraints

Determine whether constructing overlapping shards by replicating nodes (with replication factor o ≥ 1) while keeping the maximum shard size fixed by increasing the number of shards to s′ = o · s strictly improves results—such as recall in exhaustive in-shard search—compared to disjoint shard partitions under the same memory constraint.

Background

To mitigate losses for points near shard boundaries whose k-nearest neighbors span multiple shards, the paper proposes creating overlapping shards by greedily replicating nodes to the shard that contains the plurality of their neighbors. To preserve memory constraints, they do not increase shard size; instead, they increase the number of shards to maintain the same maximum shard size while introducing overlap.

In their empirical setup evaluating recall with exhaustive search, the authors note a methodological ambiguity: because overlap is achieved by increasing the number of shards rather than enlarging shard sizes, it is not evident a priori whether overlap must or will strictly improve results relative to disjoint partitions. This uncertainty motivates clarifying the conditions under which overlap yields consistent gains.

References

Since we use more shards instead of increasing their size, it is unclear whether overlap leads to strictly better results.

Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search (2403.01797 - Gottesbüren et al., 4 Mar 2024) in Section 4.3, Analyzing Partitioning and Routing Quality — Overlapping Partitions