SPLADE-v3: New baselines for SPLADE (2403.06789v1)

Published 11 Mar 2024 in cs.IR and cs.CL

Abstract: A companion to the release of the latest version of the SPLADE library. We describe changes to the training structure and present our latest series of models -- SPLADE-v3. We compare this new version to BM25, SPLADE++, as well as re-rankers, and showcase its effectiveness via a meta-analysis over more than 40 query sets. SPLADE-v3 further pushes the limit of SPLADE models: it is statistically significantly more effective than both BM25 and SPLADE++, while comparing well to cross-encoder re-rankers. Specifically, it gets more than 40 MRR@10 on the MS MARCO dev set, and improves by 2% the out-of-domain results on the BEIR benchmark.

References (21)

Authors (4)

Carlos Lassance (35 papers)
Thibault Formal (17 papers)
Stéphane Clinchant (39 papers)
Hervé Déjean (16 papers)

Citations (10)

View on Semantic Scholar

Summary

The paper introduces SPLADE-v3, leveraging augmented hard negatives and ensemble-based distillation to set new performance baselines.
The methodology combines multiple negatives per batch with a hybrid of KL-Div and MarginMSE losses to boost recall and precision.
The paper demonstrates that fine-tuning from SPLADE++SelfDistil and custom variant designs lead to significant gains across diverse query sets.

Enhancements in SPLADE Models: An Examination of SPLADE-v3

Introduction to SPLADE-v3

The technical report introduces SPLADE-v3, an advancement in the SPLADE series of models designed for improved information retrieval. SPLADE, or Sparse Lexical And Expansion Dense, models are distinguished by their ability to efficiently handle natural language queries by utilizing sparse representations. This iteration, SPLADE-v3, leverages modifications to the training structure to achieve statistically significant improvements over its predecessors and benchmark models like BM25, and it performs comparably to cross-encoder re-rankers.

Key Innovations in Model Training

Multiple Negatives per Batch

Incorporating guidance from the Tevatron framework, SPLADE-v3 is trained with an augmented number of hard negatives per batch. This strategy enhances results, particularly in in-domain settings, although it exhibits limited contributions to out-of-domain generalization.

Distillation Score Enhancement

A notable change involves the use of an ensemble of cross-encoder re-rankers to generate distillation scores. This method diverges from the traditional use of a single model for distillation, opting instead for a combination approach which, when coupled with affine transformations, yields superior model effectiveness.

Combining Distillation Losses

The report discusses merging two primary distillation losses used in information retrieval: KL-Div and MarginMSE. This hybrid approach, dictated by empirical findings on the losses’ focuses on recall and precision, respectively, culminates in improved performance indicators for SPLADE-v3.

Fine-Tuning Details

An observable gain in effectiveness was realized by initiating SPLADE-v3's training from the SPLADE++SelfDistil model, as opposed to starting from more basic model checkpoints. This phenomenon suggests the potential workings of a form of curriculum learning, even though further exploration is necessary to understand the underlying mechanisms fully.

Performance Evaluation

The evaluation of SPLADE-v3 involved a comprehensive meta-analysis encompassing over 40 query sets across various datasets, using metrics like MRR@10 and nDCG@10. The findings indicate:

A consistent outperformance of BM25, with substantial gains in most of the 44 query sets.
Improved effectiveness over SPLADE++SelfDistil across numerous datasets, save for minor exceptions.
Comparable performance to cross-encoder re-rankers, notably for specific datasets where SPLADE-v3 either matched or exceeded the re-rankers' performance metrics.

Variants of SPLADE-v3

The report introduces three additional variants of SPLADE-v3, each tailored for specific applications:

SPLADE-v3-DistilBERT: Offers a reduced inference footprint by building upon DistilBERT.
SPLADE-v3-Lexical: Removes query expansion, favoring efficiency at the cost of reduced effectiveness in out-of-domain settings.
SPLADE-v3-Doc: Alters the training initiation point to CoCondenser and simplifies the query processing, striking a balance between efficiency and efficacy.

Conclusion and Forward Look

SPLADE-v3 and its variants represent a significant step forward in the SPLADE research direction. The model's enhanced effectiveness, combined with its compelling comparison to other state-of-the-art approaches, underscores the potential of SPLADE models in tackling complex information retrieval tasks. As SPLADE-v3 sets new benchmarks, it invites further exploration into optimizing model training approaches and expanding the application horizons for SPLADE models in the field of natural language processing and beyond.

Related Papers

Tweets

https://twitter.com/_reachsumit/status/1767386490915721368

https://twitter.com/gm8xx8/status/1767377281612161168

https://twitter.com/arxivsanitybot/status/1768094402977665326

https://twitter.com/knishimae0531/status/1767752463422300441