SlideFuse: Enhanced Rank Fusion Technique

Updated 1 December 2025

SlideFuse is a probabilistic data fusion method that smooths rank relevance probabilities using a sliding window, eliminating artifacts from fixed segmentation.
It enhances retrieval effectiveness by pooling evidence from neighboring ranks, resulting in significant improvements in metrics like MAP and bpref.
The approach effectively addresses sparse relevance judgments in large datasets by blending local rank information to mitigate boundary effects.

SlideFuse is a probabilistic data fusion technique designed to improve retrieval effectiveness when combining ranked results from multiple information retrieval systems, especially in circumstances where relevance judgments are highly incomplete. It introduces a per-rank sliding window during the fusion process to smooth noisy rank-wise relevance probability estimates, thereby addressing the limitations of prior segmentation-based fusion approaches such as ProbFuse and SegFuse (Lillis et al., 2014).

1. Motivation and Context

SlideFuse was developed in response to key challenges in probabilistic data fusion for information retrieval evaluation. Standard methods (e.g., ProbFuse, SegFuse) rely on training queries with known relevance judgments to estimate, for each input system, the probability that a document returned at a given rank is relevant. However, when applied to large datasets like the TREC Web Track, relevance judgments are extremely sparse, resulting in highly jagged, unreliable probability distributions when estimated at exact ranks. Segmented approaches smooth this distribution by aggregating evidence within fixed or exponentially growing rank segments, but introduce boundary artifacts—sharp, artificial drops at segment edges that misrepresent underlying relevance (Lillis et al., 2014).

SlideFuse replaces these rigid segments with a sliding window around each rank, pooling evidence from neighboring ranks to achieve finer-grained smoothing and eliminate abrupt changes at boundaries.

2. Formal Definition

Let $S$ denote the set of input systems. For each system $s\in S$ , the document at rank $p$ is $d_p$ . Training queries $Q_{train}$ with available relevance judgments are used. $Q_p$ is the set of training queries for which $s$ returned at least $p+1$ documents. $R_{d_p,q}$ is 1 if $d_p$ is relevant to $q$ and 0 otherwise. The result-set length is $N$ , and the sliding window half-width is $w$ .

Training Phase

The per-rank raw relevance probability for system $s$ and rank $p$ is:

$P(d_p | s) = \frac{\sum_{q \in Q_p} R_{d_p,q}}{|Q_p|}$

Fusion Phase

For each rank $p$ :

Window boundaries:

$a = \max(0,\,p - w),\quad b = \min(N-1,\,p + w)$

Windowed probability:

$P(d_{p,w} \mid s) = \frac{1}{b - a + 1} \sum_{i=a}^b P(d_i \mid s)$

Cross-system combination for each document $d$ that appears at rank $p_s$ in system $s$ :

$R_d = \sum_{s \in S} P(d_{p_s,w} \mid s)$

Documents are ranked in descending order of $R_d$ to form the fused result.

3. Step-By-Step Algorithm

The SlideFuse workflow comprises two main phases:

Training Phase (per system):

For each $s \in S$ $s \in S$ and each rank $p = 0$ $p = 0$ to $N-1$ $N - 1$ :
- $Q_p = \{q \in Q_{train} : s \text{ returned} \geq p+1 \text{ documents for } q\}$
- If $|Q_p| > 0$ , set $P_{rank}[s][p] = (\sum_{q\in Q_p} R_{d_p,q}) / |Q_p|$ ; else $P_{rank}[s][p] = 0$ .

Fusion Phase (for each test query):

For each $s \in S$ , retrieve top $N$ results $L_s$ .
For each rank $p$ $p$ in $L_s$ $L_{s}$ :
- $a = \max(0,\,p-w)$ , $b = \min(N-1,\,p+w)$
- $P_{win}[s][p] = (1 / (b-a+1)) \sum_{i=a}^{b} P_{rank}[s][i]$
Initialize map $\text{Score}[d] = 0$ for documents $d$ .
For each $s \in S$ , for each rank $p$ where $L_s[p]=d$ : $\text{Score}[d] += P_{win}[s][p]$
Sort documents by $\text{Score}[d]$ descending.

Parameter $w$ controls smoothing. Small $w$ yields insufficient smoothing; large $w$ may over-smooth and blur rank discrimination. Empirically, $w=5$ was found to be an effective trade-off (Lillis et al., 2014).

4. Experimental Evaluation

SlideFuse was evaluated on the TREC-2004 Web Track, notable for its highly incomplete relevance judgments. The protocol utilized 74 topfiles, with five independent runs (each using 6 topfiles, total of 30), and repeated shuffling of 225 queries to create disjoint training and test splits. SlideFuse ( $w=5$ ) was compared against CombMNZ (baseline score-based fusion), ProbFuse (25 equal-length segments), and SegFuse (exponentially growing segments).

Three metrics were reported:

MAP (Mean Average Precision): Assumes unjudged documents are nonrelevant.
bpref: Ignores unjudged documents, robust to incompleteness.
P10: Precision at rank 10, reflecting typical user-focused evaluation.

Results, averaged over five runs, demonstrate substantial gains: | Metric | SlideFuse | Best Baseline (SegFuse) | Relative Improvement | |---|---|---|---| | MAP | 0.4772 | 0.3314 | +44.0% (p < 0.01) | | bpref | 0.3910 | 0.3486 | +12.2% (p < 0.01) | | P10 | 0.1378 | 0.1178 | +17.0% (p < 0.01) |

SlideFuse consistently outperformed all three baselines across metrics and runs, with only a few non-significant exceptions (Lillis et al., 2014).

5. Algorithmic Characteristics and Influences

SlideFuse leverages localized smoothing of rank relevance probabilities using a fixed-width sliding window, replacing the segment-based probability sharing of earlier methods. This distinction ensures continuity across ranks and removes artifacts introduced by segment boundary choices.

The technique preserves the Chorus Effect, summing evidence across multiple systems, and the Skimming Effect, which privileges higher scores for documents retrieved at earlier ranks. The smoothing is uniform within the window; each neighbor contributes equally. There is no per-system weighting beyond the probabilities learned during training.

A plausible implication is that uniform weighting might miss finer distinctions if closer ranks are more semantically informative. Additionally, the lack of score contribution for documents beyond observed training ranks suggests an opportunity for methodological extension.

6. Limitations and Prospective Enhancements

Notable limitations are:

All systems' results are treated equally, without per-run reliability weighting.
The sliding window is uniform rather than distance-weighted.
Documents appearing at ranks not observed during training receive zero score.

Suggested future directions include:

Introducing per-system/ per-run weights derived from confidence estimates.
Distance-based weighting within the window to give more influence to nearer ranks.
Extrapolation or floor strategies for ranks beyond those seen in training.
Adaptive window size ( $w(p)$ ) based on rank or system-specific characteristics.

These potential enhancements aim to further refine fused ranking accuracy, particularly under severe relevance judgment sparsity (Lillis et al., 2014).

Markdown Report Issue Upgrade to Chat

References (1)

Extending Probabilistic Data Fusion Using Sliding Windows (2014)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SlideFuse.

SlideFuse: Enhanced Rank Fusion Technique

1. Motivation and Context

2. Formal Definition

Training Phase

Fusion Phase

3. Step-By-Step Algorithm

4. Experimental Evaluation

5. Algorithmic Characteristics and Influences

6. Limitations and Prospective Enhancements

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

SlideFuse: Enhanced Rank Fusion Technique

1. Motivation and Context

2. Formal Definition

Training Phase

Fusion Phase

3. Step-By-Step Algorithm

4. Experimental Evaluation

5. Algorithmic Characteristics and Influences

6. Limitations and Prospective Enhancements

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research