QueryExplorer: An Interactive Query Generation Assistant for Search and Exploration (2403.15667v1)
Abstract: Formulating effective search queries remains a challenging task, particularly when users lack expertise in a specific domain or are not proficient in the language of the content. Providing example documents of interest might be easier for a user. However, such query-by-example scenarios are prone to concept drift, and the retrieval effectiveness is highly sensitive to the query generation method, without a clear way to incorporate user feedback. To enable exploration and to support Human-In-The-Loop experiments we propose QueryExplorer -- an interactive query generation, reformulation, and retrieval interface with support for HuggingFace generation models and PyTerrier's retrieval pipelines and datasets, and extensive logging of human feedback. To allow users to create and modify effective queries, our demo supports complementary approaches of using LLMs interactively, assisting the user with edits and feedback at multiple stages of the query formulation process. With support for recording fine-grained interactions and user annotations, QueryExplorer can serve as a valuable experimental and research platform for annotation, qualitative evaluation, and conducting Human-in-the-Loop (HITL) experiments for complex search tasks where users struggle to formulate queries.
- Umass at trec 2004: Novelty and hard. Computer Science Department Faculty Publication Series, page 189.
- Gradio: Hassle-free sharing and testing of ml models in the wild. ArXiv, abs/1906.02569.
- Spacerini: Plug-and-play search engines with pyserini and hugging face. arXiv preprint arXiv:2302.14534.
- Can generative llms create query variants for test collections? an exploratory study. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 1869–1873, New York, NY, USA. Association for Computing Machinery.
- Ext5: Towards extreme multi-task scaling for transfer learning. In International Conference on Learning Representations.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Language models are few-shot learners. volume 33, pages 1877–1901.
- Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. Acm Computing Surveys (CSUR), 44(1):1–50.
- Harrison Chase. 2022. Langchain.
- Query generation with external knowledge for dense retrieval. In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 22–32, Dublin, Ireland and Online. Association for Computational Linguistics.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Kaustubh Dhole. 2023. Large language models as SocioTechnical systems. In Proceedings of the Big Picture Workshop, pages 66–79, Singapore. Association for Computational Linguistics.
- Kaustubh Dhole. 2024. Kaucus-knowledgeable user simulators for training large language models. In Proceedings of the 1st Workshop on Simulating Conversational Intelligence in Chat (SCI-CHAT 2024), pages 53–65.
- Nl-augmenter: A framework for task-sensitive natural language augmentation. Northern European Journal of Language Technology, 9(1).
- Kaustubh D. Dhole and Eugene Agichtein. 2024. Genqrensemble: Zero-shot llm ensemble prompting for generative query reformulation. In Advances in Information Retrieval, pages 326–335, Cham. Springer Nature Switzerland.
- An interactive query generation assistant using llm-based prompt modification and user feedback.
- Doctag: A customizable annotation tool for ground truth creation. In European Conference on Information Retrieval, pages 288–293. Springer.
- Doc2query–: When less is more. In European Conference on Information Retrieval, pages 414–422.
- Ralle: A framework for developing and evaluating retrieval-augmented large language models.
- fastrag: Efficient retrieval augmentation and generation framework.
- Query expansion by prompting large language models. arXiv preprint arXiv:2305.03653.
- Unsupervised document expansion for information retrieval with stochastic text generation. In Proceedings of the Second Workshop on Scholarly Document Processing, pages 7–17, Online. Association for Computational Linguistics.
- Learning user reformulation behavior for query auto-completion. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 445–454.
- NPRF: A neural pseudo relevance feedback framework for ad-hoc information retrieval. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4482–4491, Brussels, Belgium. Association for Computational Linguistics.
- Pyserini: A python toolkit for reproducible information retrieval research with sparse and dense representations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, page 2356–2362, New York, NY, USA. Association for Computing Machinery.
- Search interface design and evaluation. Found. Trends Inf. Retr., 15(3–4):243–416.
- Search interface design and evaluation. 15(3–4):243–416.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9).
- Search interfaces for biomedical searching: How do gaze, user perception, search behaviour and search performance relate? In Proceedings of the 2022 Conference on Human Information Interaction and Retrieval, CHIIR ’22, page 78–89, New York, NY, USA. Association for Computing Machinery.
- Simplified data wrangling with ir_datasets. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2429–2436.
- Pyterrier: Declarative experimentation in python from bm25 to dense retrieval. In Proceedings of the 30th acm international conference on information & knowledge management, pages 4526–4533.
- Generative relevance feedback with large language models. arXiv preprint arXiv:2304.13157.
- Timothy Mckinnon and Carl Rubino. 2022. The IARPA BETTER program abstract task four new semantically annotated corpora from IARPA’s BETTER program. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3595–3600, Marseille, France. European Language Resources Association.
- Simplyretrieve: A private and lightweight retrieval-centric generative ai tool. arXiv preprint arXiv:2308.03983.
- Document expansion by query prediction. arXiv preprint arXiv:1904.08375.
- Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- Sheikh Muhammad Sarwar and James Allan. 2020. Query by example for cross-lingual event retrieval. SIGIR ’20, page 1601–1604, New York, NY, USA. Association for Computing Machinery.
- Harrisen Scells and Martin Potthast. 2023. Pybool_ir: A toolkit for domain-specific search experiments. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 3190–3194, New York, NY, USA. Association for Computing Machinery.
- Ian Soboroff. 2023. The better cross-language datasets. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 3047–3053, New York, NY, USA. Association for Computing Machinery.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research.
- Generative query reformulation for effective adhoc search. In The First Workshop on Generative Information Retrieval, SIGIR 2023.
- Colbert-prf: Semantic pseudo-relevance feedback for dense passage and document retrieval. ACM Transactions on the Web, 17(1):1–39.
- When do generative query and document expansions fail? a comprehensive study across methods, retrievers, and datasets. arXiv preprint arXiv:2309.08541.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45.
- A new visual search interface for web browsing. WSDM ’09, page 152–161, New York, NY, USA. Association for Computing Machinery.
- Zero-shot query reformulation for conversational search. In Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR ’23, page 257–263, New York, NY, USA. Association for Computing Machinery.
- Query reformulation based on user habits for query-by-humming systems. In Information Retrieval Technology: 8th Asia Information Retrieval Societies Conference, AIRS 2012, Tianjin, China, December 17-19, 2012. Proceedings 8, pages 386–395. Springer.
- BERT-QE: Contextualized Query Expansion for Document Re-ranking. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4718–4728, Online. Association for Computational Linguistics.
- Moshé M. Zloof. 1975. Query by example. AFIPS ’75, page 431–438, New York, NY, USA. Association for Computing Machinery.
- Kaustubh D. Dhole (22 papers)
- Shivam Bajaj (10 papers)
- Ramraj Chandradevan (6 papers)
- Eugene Agichtein (33 papers)