Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Topic Relevance Model by Mix-structured Summarization and LLM-based Data Augmentation (2404.02616v1)

Published 3 Apr 2024 in cs.IR and cs.CL

Abstract: Topic relevance between query and document is a very important part of social search, which can evaluate the degree of matching between document and user's requirement. In most social search scenarios such as Dianping, modeling search relevance always faces two challenges. One is that many documents in social search are very long and have much redundant information. The other is that the training data for search relevance model is difficult to get, especially for multi-classification relevance model. To tackle above two problems, we first take query concatenated with the query-based summary and the document summary without query as the input of topic relevance model, which can help model learn the relevance degree between query and the core topic of document. Then, we utilize the language understanding and generation abilities of LLM to rewrite and generate query from queries and documents in existing training data, which can construct new query-document pairs as training data. Extensive offline experiments and online A/B tests show that the proposed approaches effectively improve the performance of relevance modeling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Convolutional neural networks for soft-matching n-grams in ad-hoc search. In Proceedings of the eleventh ACM international conference on web search and data mining. 126–134.
  2. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM international on conference on information and knowledge management. 55–64.
  3. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2333–2338.
  4. A unified neural network approach to e-commerce relevance learning. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data. 1–7.
  5. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, Vol. 1. 2.
  6. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th international conference on world wide web. 1291–1299.
  7. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  8. Deeprank: A new deep architecture for relevance ranking in information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 257–266.
  9. Felix Schneider and Marco Turchi. 2023. Team Zoom @ AutoMin 2023: Utilizing Topic Segmentation And LLM Data Augmentation For Long-Form Meeting Summarization. In Proceedings of the 16th International Natural Language Generation Conference: Generation Challenges, Simon Mille (Ed.). 101–107.
  10. Self-Instruct: Aligning Language Models with Self-Generated Instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). 13484–13508.
  11. Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 496–505.
  12. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval. 55–64.
  13. Deep learning for matching in search and recommendation. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1365–1368.
  14. Learning a product relevance model from click-through data in e-commerce. In Proceedings of the Web Conference 2021. 2890–2899.
  15. SPM: Structured Pretraining and Matching Architectures for Relevance Modeling in Meituan Search. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4923–4929.
  16. Automatically generating questions from queries for community-based question answering. In Proceedings of 5th international joint conference on natural language processing. 929–937.
  17. Auto-regressive extractive summarization with replacement. World Wide Web 26, 4 (2023), 2003–2026.
  18. Pre-trained language model based ranking in Baidu search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 4014–4022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yizhu Liu (9 papers)
  2. Ran Tao (82 papers)
  3. Shengyu Guo (1 paper)
  4. Yifan Yang (578 papers)

Summary

We haven't generated a summary for this paper yet.