Dice Question Streamline Icon: https://streamlinehq.com

Characterizing effective query embeddings for dense retrieval

Characterize what constitutes an effective query embedding for dense neural retrieval systems that embed queries and documents into a shared vector space, to guide the design of query augmentation strategies for dense retrievers.

Information Square Streamline Icon: https://streamlinehq.com

Background

Unlike sparse retrievers (e.g., BM25), dense retrievers rely on vector embeddings of queries and documents. The paper notes that designing effective query representations for dense models is challenging because the criteria for a good query embedding are not yet well understood.

Clarifying these criteria would inform both zero/few-shot prompting strategies and trainable approaches (including RL) aimed at producing embeddings that maximize dense retrieval effectiveness.

References

Sparse methods such as BM25 benefit directly from term-level expansions, while dense retrievers, which embed queries and documents into a shared vector space, pose a greater challenge since it is unclear what constitutes an effective query embedding.

Rethinking On-policy Optimization for Query Augmentation (2510.17139 - Xu et al., 20 Oct 2025) in Section 3, Background (Prompt-based Query Augmentation)