- The paper demonstrates that pairwise judgment formulation based on click data outperforms traditional learning-to-rank approaches for training semantic embedding models.
- Experimental results on datasets with 23 million and 530,000 judgments validate the superior effectiveness of clicked > non-examined and hybrid strategies.
- The study advocates tailored judgment strategies for semantic embeddings, prompting further research to integrate richer user signals for enhanced search performance.
The paper "Pairwise Judgment Formulation for Semantic Embedding Model in Web Search" by Mengze Hong and Chen Jason Zhang undertakes a rigorous investigation into formulating effective pairwise judgments for training Semantic Embedding Models (SEMs) in web search engines. Despite the growing application of SEMs in the information retrieval and NLP domains, the methodologies for generating training data through pairwise judgments have not been thoroughly scrutinized. This study aims to fill this gap by comparing various strategies derived from query logs and user click-through data, ultimately demonstrating more effective techniques for SEMs as compared to established Learning-to-Rank (LTR) strategies.
The authors emphasize that SEMs, typically structured using a neural network-based Siamese architecture, are predominantly trained using supervised learning on search engine query logs. The query logs, comprised of user queries, search results, and user activity records, serve as the basis for creating pairwise training instances. For instance, if a user prefers title pq​ over title nq​ for a query q, this preference is represented as pq​>nq​, guiding the SEM to increase the similarity between (q,pq​) while decreasing the similarity between (q,nq​).
Key Contributions
- Comprehensive Evaluation of Pairwise Judgment Strategies: The authors conduct the first in-depth analysis of strategies for generating pairwise judgments specifically tailored for SEMs. Their work reveals that conventional strategies used in pairwise LTR, such as formulating judgments between clicked and skipped results, may not be optimal for SEMs.
- Empirical Validation: The study employs a large-scale empirical analysis using query logs and click-through data from a major commercial search engine. They demonstrate the superior effectiveness of strategies such as Clicked > Non-Examined and a hybrid strategy, Clicked > Non-Clicked.
- Experimental Design and Data: By preparing two distinct testing datasets (Test-1 and Test-2) – one derived from query logs and another manually curated by human experts – the paper ensures a robust evaluation framework. Test-1 contains 23 million pairwise judgments, while Test-2 includes 530,000 judgments.
Atomic and Hybrid Strategies
The paper examines several "atomic" strategies:
- Clicked > Skipped: Assumes clicked results outrank skipped ones.
- Clicked > Clicked: Prefers results with higher click-through rates (CTR).
- Clicked > Non-Examined: Prefers clicked results over non-examined results.
- Skipped > Non-Examined: Assumes skipped results outrank non-examined results.
It is noteworthy that the strategy Clicked > Non-Examined, despite being rarely used in LTR, was found to be the most effective for SEM training. This challenges the prevailing wisdom in LTR, where Clicked > Skipped is often deemed optimal.
The authors also propose a hybrid strategy:
- Clicked > Non-Clicked: Combines Clicked > Skipped and Clicked > Non-Examined.
Interestingly, Clicked > Non-Clicked outperforms other strategies in Test-1, showing slight improvements over Clicked > Non-Examined. Analysis suggests that while the latter provides the highest-quality training data, the hybrid approach's increased volume of training instances contributes to its performance.
Implications and Future Work
The findings have significant implications for practitioners working with embedding-based models. Notably, the study:
- Advocates for peer strategies tailored specifically for SEMs rather than directly applying LTR methodologies.
- Suggests that hybrid strategies may leverage larger training data volumes, offering potential performance benefits.
- Highlights the importance of further exploring pairwise judgment formulation for SEMs, including incorporating additional user signals.
Future research directions may involve integrating diverse signals into the pairwise judgment formulation process and exploring SEM variants. These formulated judgments can potentially enhance various downstream applications like topic discovery, intent mining, and user personalization, as indicated by the citations of related work on these areas.
Conclusion
In conclusion, this paper provides a detailed study and practical recommendations for formulating pairwise judgments in training Semantic Embedding Models for web search. By comparing several strategies through comprehensive experiments, the authors highlight effective methodologies distinct from traditional LTR practices. This work not only strengthens the understanding of SEM training but also paves the way for improvements in search engine performance and other related applications.