Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

38 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers (2403.18276v2)

Published 27 Mar 2024 in cs.IR and cs.CL

Abstract: Transformer structure has achieved great success in multiple applied machine learning communities, such as NLP, computer vision (CV) and information retrieval (IR). Transformer architecture's core mechanism -- attention requires $O(n^2)$ time complexity in training and $O(n)$ time complexity in inference. Many works have been proposed to improve the attention mechanism's scalability, such as Flash Attention and Multi-query Attention. A different line of work aims to design new mechanisms to replace attention. Recently, a notable model structure -- Mamba, which is based on state space models, has achieved transformer-equivalent performance in multiple sequence modeling tasks. In this work, we examine \mamba's efficacy through the lens of a classical IR task -- document ranking. A reranker model takes a query and a document as input, and predicts a scalar relevance score. This task demands the LLM's ability to comprehend lengthy contextual inputs and to capture the interaction between query and document tokens. We find that (1) Mamba models achieve competitive performance compared to transformer-based models with the same training recipe; (2) but also have a lower training throughput in comparison to efficient transformer implementations such as flash attention. We hope this study can serve as a starting point to explore Mamba models in other classical IR tasks. Our code implementation and trained checkpoints are made public to facilitate reproducibility (https://github.com/zhichaoxu-shufe/RankMamba).

PDF HTML Abstract

Benchmarking Mamba's Document Ranking Performance Against Transformers

Comparative Evaluation of Document Ranking Models

In the sphere of information retrieval (IR), the emergence of transformer-based LLMs has significantly reshaped the way we understand and process natural language data. The paper conducted by Zhichao Xu et al., focuses on evaluating the performance of a recent model structure, Mamba, within the context of the classical IR task of document ranking. The outcomes of this exploration provide nuanced insights into the competitive landscape of LLMs in terms of efficiency and efficacy.

Background and Model Overview

Transformer architectures have heralded advancements across various machine learning applications, notable for their capacity to capture long-range dependencies within sequences. Despite their success, the quadratic computational complexity of the attention mechanism has prompted efforts to devise more scalable alternatives. A noteworthy development in this endeavor is the Mamba model, which operates on the principles of Selective State Space Models (SSMs) to foster transformer-equivalent performance while aiming for superior computational efficiency.

Research Questions and Methodology

The core objective of the paper was to ascertain whether Mamba models could offer performance on par with or superior to transformer-based models in document ranking tasks. The investigation entailed a rigorous benchmarking process, pitting Mamba against a diverse array of transformer-based models, including encoder-only, decoder-only, and encoder-decoder frameworks across different scales. The benchmark focused on models with varying pre-training objectives, sizes, and attention mechanisms, employing established training recipes and evaluating their performance through the lens of the document ranking task. This task necessitates a model's ability to discern and quantify the relevance between queries and documents, demanding both comprehensive understanding and contextual interpretation capabilities from the underlying LLM.

Key Findings

The empirical analysis revealed several critical findings:

Encoder-only transformer models demonstrated robust performance in document ranking tasks, with roberta-large notably outperforming its counterparts in terms of the MRR metric on the MSMARCO Dev set.
Mamba models showcased competitive performance, sometimes matching or surpassing the transformer-based models' effectiveness. This is a considerable achievement, emphasizing Mamba's potential in handling complex IR tasks.
However, it was observed that Mamba models suffer from lower training throughput compared to advanced transformer implementations incorporating efficient attention mechanisms such as Flash Attention.

Implications and Future Directions

The findings from this paper underscore Mamba models' viability as a potent alternative to transformer-based models for document ranking tasks, hinting at their broader applicability across classical IR tasks. Nonetheless, the noted deficiency in training throughput for Mamba models compared to some transformer models signifies a potential area for future optimization. This limitation does not diminish Mamba’s achievements but rather highlights a trajectory for enhancing its implementation to fully leverage its efficiency and scalability advantages.

Advancing Mamba's computational efficiency without compromising its performance could redefine the benchmarks for LLM deployments in IR, offering a blend of efficacy and efficiency. As the IR field continues to evolve, the exploration of models like Mamba, which challenge the status quo and push the boundaries of computational efficiency, remains crucial in our ongoing quest to develop more capable, scalable, and efficient language processing systems.

Concluding Remarks

The paper’s exploration into Mamba models within the domain of document ranking presents a promising avenue for future research. The competitive performance of Mamba models, juxtaposed with their current limitations in training throughput, offers a nuanced perspective on the potential and challenges of deploying SSM-based models in IR tasks. As we move forward, refining these models and overcoming their limitations will be paramount in harnessing their full potential, paving the way for their broader application across the diverse ecosystem of IR tasks.

PDF Markdown Bookmark Chat (Pro)

References (51)

Authors (1)

Zhichao Xu (30 papers)

Citations (6)

View on Semantic Scholar

Tweets

https://twitter.com/_reachsumit/status/1773471264864407790

https://twitter.com/zhichaoxu_ir/status/1773371216465739851

https://twitter.com/zhichaoxu_ir/status/1857505735363461455