Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 168 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 122 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 464 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Extracting Document Relations from Search Corpus by Marginalizing over User Queries (2507.10726v1)

Published 14 Jul 2025 in cs.IR and cs.LG

Abstract: Understanding relationships between documents in large-scale corpora is essential for knowledge discovery and information organization. However, existing approaches rely heavily on manual annotation or predefined relationship taxonomies. We propose EDR-MQ (Extracting Document Relations by Marginalizing over User Queries), a novel framework that discovers document relationships through query marginalization. EDR-MQ is based on the insight that strongly related documents often co-occur in results across diverse user queries, enabling us to estimate joint probabilities between document pairs by marginalizing over a collection of queries. To enable this query marginalization approach, we develop Multiply Conditioned Retrieval-Augmented Generation (MC-RAG), which employs conditional retrieval where subsequent document retrievals depend on previously retrieved content. By observing co-occurrence patterns across diverse queries, EDR-MQ estimates joint probabilities between document pairs without requiring labeled training data or predefined taxonomies. Experimental results show that our query marginalization approach successfully identifies meaningful document relationships, revealing topical clusters, evidence chains, and cross-domain connections that are not apparent through traditional similarity-based methods. Our query-driven framework offers a practical approach to document organization that adapts to different user perspectives and information needs.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.