Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SPLADE-v3: New baselines for SPLADE (2403.06789v1)

Published 11 Mar 2024 in cs.IR and cs.CL

Abstract: A companion to the release of the latest version of the SPLADE library. We describe changes to the training structure and present our latest series of models -- SPLADE-v3. We compare this new version to BM25, SPLADE++, as well as re-rankers, and showcase its effectiveness via a meta-analysis over more than 40 query sets. SPLADE-v3 further pushes the limit of SPLADE models: it is statistically significantly more effective than both BM25 and SPLADE++, while comparing well to cross-encoder re-rankers. Specifically, it gets more than 40 MRR@10 on the MS MARCO dev set, and improves by 2% the out-of-domain results on the BEIR benchmark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. E. Bassani. ranx: A blazing-fast python library for ranking evaluation and comparison. In European Conference on Information Retrieval, pages 259–264. Springer, 2022.
  2. Overview of the trec 2022 deep learning track. In Text Retrieval Conference, 2022.
  3. Benchmarking middle-trained language models for neural search. arXiv preprint arXiv:2306.02867, 2023.
  4. Splade v2: Sparse lexical and expansion model for information retrieval, 2021.
  5. From distillation to hard negative sampling: Making sparse neural ir models more effective. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2353–2359, 2022.
  6. Towards effective and efficient sparse neural information retrieval. ACM Trans. Inf. Syst., dec 2023. Just Accepted.
  7. SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. In Proc. SIGIR, page 2288–2292, 2021.
  8. L. Gao and J. Callan. Unsupervised corpus aware language model pre-training for dense passage retrieval. In S. Muresan, P. Nakov, and A. Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2843–2853, Dublin, Ireland, May 2022. Association for Computational Linguistics.
  9. Tevatron: An efficient and flexible toolkit for dense retrieval. arXiv preprint arXiv:2203.05765, 2022.
  10. Improving efficient neural ranking models with cross-architecture knowledge distillation, 2021.
  11. C. Lassance and S. Clinchant. An efficiency study for splade models. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2220–2226, 2022.
  12. C. Lassance and S. Clinchant. The tale of two ms marco – and their unfair comparisons, 2023.
  13. C. Lassance and S. Clinchant. The tale of two msmarco - and their unfair comparisons. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 2431–2435, New York, NY, USA, 2023. Association for Computing Machinery.
  14. Distilling dense representations for ranking using tightly-coupled teachers, 2020.
  15. Simplified data wrangling with ir_datasets. In SIGIR, 2021.
  16. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 10 2019.
  17. ColBERTv2: Effective and efficient retrieval via lightweight late interaction. In M. Carpuat, M.-C. de Marneffe, and I. V. Meza Ruiz, editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3715–3734, Seattle, United States, July 2022. Association for Computational Linguistics.
  18. Ranger: A toolkit for effect-size based multi-task evaluation. arXiv preprint arXiv:2305.15048, 2023.
  19. Exploring effect-size-based meta-analysis for multi-dataset evaluation. 2023.
  20. Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663, 2021.
  21. Curriculum learning for dense retrieval distillation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1979–1983, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Carlos Lassance (35 papers)
  2. Thibault Formal (17 papers)
  3. Stéphane Clinchant (39 papers)
  4. Hervé Déjean (16 papers)
Citations (10)

Summary

  • The paper introduces SPLADE-v3, leveraging augmented hard negatives and ensemble-based distillation to set new performance baselines.
  • The methodology combines multiple negatives per batch with a hybrid of KL-Div and MarginMSE losses to boost recall and precision.
  • The paper demonstrates that fine-tuning from SPLADE++SelfDistil and custom variant designs lead to significant gains across diverse query sets.

Enhancements in SPLADE Models: An Examination of SPLADE-v3

Introduction to SPLADE-v3

The technical report introduces SPLADE-v3, an advancement in the SPLADE series of models designed for improved information retrieval. SPLADE, or Sparse Lexical And Expansion Dense, models are distinguished by their ability to efficiently handle natural language queries by utilizing sparse representations. This iteration, SPLADE-v3, leverages modifications to the training structure to achieve statistically significant improvements over its predecessors and benchmark models like BM25, and it performs comparably to cross-encoder re-rankers.

Key Innovations in Model Training

Multiple Negatives per Batch

Incorporating guidance from the Tevatron framework, SPLADE-v3 is trained with an augmented number of hard negatives per batch. This strategy enhances results, particularly in in-domain settings, although it exhibits limited contributions to out-of-domain generalization.

Distillation Score Enhancement

A notable change involves the use of an ensemble of cross-encoder re-rankers to generate distillation scores. This method diverges from the traditional use of a single model for distillation, opting instead for a combination approach which, when coupled with affine transformations, yields superior model effectiveness.

Combining Distillation Losses

The report discusses merging two primary distillation losses used in information retrieval: KL-Div and MarginMSE. This hybrid approach, dictated by empirical findings on the losses’ focuses on recall and precision, respectively, culminates in improved performance indicators for SPLADE-v3.

Fine-Tuning Details

An observable gain in effectiveness was realized by initiating SPLADE-v3's training from the SPLADE++SelfDistil model, as opposed to starting from more basic model checkpoints. This phenomenon suggests the potential workings of a form of curriculum learning, even though further exploration is necessary to understand the underlying mechanisms fully.

Performance Evaluation

The evaluation of SPLADE-v3 involved a comprehensive meta-analysis encompassing over 40 query sets across various datasets, using metrics like MRR@10 and nDCG@10. The findings indicate:

  • A consistent outperformance of BM25, with substantial gains in most of the 44 query sets.
  • Improved effectiveness over SPLADE++SelfDistil across numerous datasets, save for minor exceptions.
  • Comparable performance to cross-encoder re-rankers, notably for specific datasets where SPLADE-v3 either matched or exceeded the re-rankers' performance metrics.

Variants of SPLADE-v3

The report introduces three additional variants of SPLADE-v3, each tailored for specific applications:

  • SPLADE-v3-DistilBERT: Offers a reduced inference footprint by building upon DistilBERT.
  • SPLADE-v3-Lexical: Removes query expansion, favoring efficiency at the cost of reduced effectiveness in out-of-domain settings.
  • SPLADE-v3-Doc: Alters the training initiation point to CoCondenser and simplifies the query processing, striking a balance between efficiency and efficacy.

Conclusion and Forward Look

SPLADE-v3 and its variants represent a significant step forward in the SPLADE research direction. The model's enhanced effectiveness, combined with its compelling comparison to other state-of-the-art approaches, underscores the potential of SPLADE models in tackling complex information retrieval tasks. As SPLADE-v3 sets new benchmarks, it invites further exploration into optimizing model training approaches and expanding the application horizons for SPLADE models in the field of natural language processing and beyond.