Towards Scalability and Extensibility of Query Reformulation Modeling in E-commerce Search (2402.11202v2)
Abstract: Customer behavioral data significantly impacts e-commerce search systems. However, in the case of less common queries, the associated behavioral data tends to be sparse and noisy, offering inadequate support to the search mechanism. To address this challenge, the concept of query reformulation has been introduced. It suggests that less common queries could utilize the behavior patterns of their popular counterparts with similar meanings. In Amazon product search, query reformulation has displayed its effectiveness in improving search relevance and bolstering overall revenue. Nonetheless, adapting this method for smaller or emerging businesses operating in regions with lower traffic and complex multilingual settings poses the challenge in terms of scalability and extensibility. This study focuses on overcoming this challenge by constructing a query reformulation solution capable of functioning effectively, even when faced with limited training data, in terms of quality and scale, along with relatively complex linguistic characteristics. In this paper we provide an overview of the solution implemented within Amazon product search infrastructure, which encompasses a range of elements, including refining the data mining process, redefining model training objectives, and reshaping training strategies. The effectiveness of the proposed solution is validated through online A/B testing on search ranking and Ads matching. Notably, employing the proposed solution in search ranking resulted in 0.14% and 0.29% increase in overall revenue in Japanese and Hindi cases, respectively, and a 0.08% incremental gain in the English case compared to the legacy implementation; while in search Ads matching led to a 0.36% increase in Ads revenue in the Japanese case.
- Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 19–26.
- Simrank++ query rewriting through link analysis of the clickgraph (poster). In Proceedings of the 17th international conference on World Wide Web. 1177–1178.
- Doug Beeferman and Adam Berger. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 407–416.
- The query-flow graph: model and applications. In Proceedings of the 17th ACM conference on Information and knowledge management. 609–618.
- Query suggestions using query-flow graphs. In Proceedings of the 2009 workshop on Web Search Click Data. 56–63.
- Automated query reformulation for efficient search based on query logs from stack overflow. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1273–1285.
- Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. Acm Computing Surveys (CSUR) 44, 1 (2012), 1–50.
- Pre-training for query rewriting in a spoken language understanding system. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7969–7973.
- Graph Meets LLM: A Novel Approach to Collaborative Filtering for Robust Conversational Understanding. arXiv preprint arXiv:2305.14449 (2023).
- Personalized search-based query rewrite system for conversational ai. In Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI. 179–188.
- Probabilistic query expansion using query logs. In Proceedings of the 11th international conference on World Wide Web. 325–332.
- Entropy-biased models for query representation on the click graph. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. 339–346.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Concept-based interactive query expansion. In Proceedings of the 14th ACM international conference on Information and knowledge management. 696–703.
- Anatomy of the long tail: ordinary people with extraordinary tastes. In Proceedings of the third ACM international conference on Web search and data mining. 201–210.
- Context-and content-aware embeddings for query rewriting in sponsored search. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 383–392.
- CGF: Constrained generation framework for query rewriting in conversational AI. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track. 475–483.
- Learning to rewrite queries. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1443–1452.
- Behavior-driven query similarity prediction based on pre-trained language models for e-commerce search. (2023).
- Query Expansion by Prompting Large Language Models. arXiv preprint arXiv:2305.03653 (2023).
- Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Eduardo Blanco and Wei Lu (Eds.). Association for Computational Linguistics, Brussels, Belgium, 66–71. https://doi.org/10.18653/v1/D18-2012
- Query Rewriting in TaoBao Search. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3262–3271.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
- Learning to suggest: a machine learning framework for ranking query suggestions. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 25–34.
- Large Language Model based Long-tail Query Rewriting in Taobao Search. arXiv preprint arXiv:2311.03758 (2023).
- Query Rewriting via Cycle-Consistent Translation for E-Commerce Search. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2435–2446.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).
- Kihyuk Sohn. 2016. Improved deep metric learning with multi-class n-pair loss objective. Advances in neural information processing systems 29 (2016).
- Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6398–6407.
- Improving recommendation for long-tail queries via templates. In Proceedings of the 20th international conference on World wide web. 47–56.
- Improving text embeddings with large language models. arXiv preprint arXiv:2401.00368 (2023).
- Query2doc: Query Expansion with Large Language Models. arXiv preprint arXiv:2303.07678 (2023).
- Ranking relevance in yahoo search. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 323–332.
- [Industry] Unified Contextual Query Rewriting. In The 61st Annual Meeting Of The Association For Computational Linguistics.
- [Industry] Context-Aware Query Rewriting for Improving Users’ Search Experience on E-commerce Websites. In The 61st Annual Meeting Of The Association For Computational Linguistics.
- Ziqi Zhang (64 papers)
- Yupin Huang (2 papers)
- Quan Deng (3 papers)
- Jinghui Xiao (9 papers)
- Vivek Mittal (4 papers)
- Jingyuan Deng (3 papers)