Characterizing effective query embeddings for dense retrieval

Characterize what constitutes an effective query embedding for dense neural retrieval systems that embed queries and documents into a shared vector space, to guide the design of query augmentation strategies for dense retrievers.

Background

Unlike sparse retrievers (e.g., BM25), dense retrievers rely on vector embeddings of queries and documents. The paper notes that designing effective query representations for dense models is challenging because the criteria for a good query embedding are not yet well understood.

Clarifying these criteria would inform both zero/few-shot prompting strategies and trainable approaches (including RL) aimed at producing embeddings that maximize dense retrieval effectiveness.

References

Sparse methods such as BM25 benefit directly from term-level expansions, while dense retrievers, which embed queries and documents into a shared vector space, pose a greater challenge since it is unclear what constitutes an effective query embedding.

— Rethinking On-policy Optimization for Query Augmentation (2510.17139 - Xu et al., 20 Oct 2025) in Section 3, Background (Prompt-based Query Augmentation)

Characterizing effective query embeddings for dense retrieval

Background

References

Related Problems