Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Retrieve for Job Matching (2402.13435v1)

Published 21 Feb 2024 in cs.IR and cs.LG

Abstract: Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we discuss applying learning-to-retrieve technology to enhance LinkedIns job search and recommendation systems. In the realm of promoted jobs, the key objective is to improve the quality of applicants, thereby delivering value to recruiter customers. To achieve this, we leverage confirmed hire data to construct a graph that evaluates a seeker's qualification for a job, and utilize learned links for retrieval. Our learned model is easy to explain, debug, and adjust. On the other hand, the focus for organic jobs is to optimize seeker engagement. We accomplished this by training embeddings for personalized retrieval, fortified by a set of rules derived from the categorization of member feedback. In addition to a solution based on a conventional inverted index, we developed an on-GPU solution capable of supporting both KNN and term matching efficiently.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Learning to match jobs with resumes from sparse interaction data using multi-view co-teaching network. In Proceedings of the 29th ACM CIKM. 65–74.
  2. CaSMoS: A framework for learning candidate selection models over structured queries and documents. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 441–450.
  3. Efficient query evaluation using a two-level retrieval process. In Proceedings of the twelfth international conference on Information and knowledge management. 426–434.
  4. Introduction to algorithms. MIT press.
  5. Corné De Ruijt and Sandjai Bhulai. 2021. Job recommender systems: A review. arXiv preprint arXiv:2111.13576 (2021).
  6. Efthimis N Efthimiadis. 1996. Query Expansion. Annual review of information science and technology (ARIST) 31 (1996), 121–87.
  7. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
  8. Embedding-based retrieval in facebook search. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2553–2561.
  9. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117–128.
  10. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547.
  11. Personalized job recommendation system at linkedin: Practical challenges and lessons learned. In Proceedings of the 11th RecSys. 346–347.
  12. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13 (2017), 3521–3526.
  13. Semantic matching in search. Foundations and Trends® in Information Retrieval 7, 5 (2014), 343–469.
  14. How to get them a dream job? Entity-aware features for personalized job search ranking. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 501–510.
  15. Ping Li and Xiaoyun Li. 2023. OPORP: One permutation+ one random projection. arXiv preprint arXiv:2302.03505 (2023).
  16. Embedding-based product retrieval in taobao search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3181–3189.
  17. Deep job understanding at linkedin. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2145–2148.
  18. Budget-split testing: A trustworthy and powerful approach to marketplace A/B testing. https://www.linkedin.com/blog/engineering/infrastructure/budget-split-testing
  19. Que2Search: fast and accurate query and document understanding for search at Facebook. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3376–3384.
  20. A recommender system for job seeking and recruiting website. In Proceedings of the 22nd International Conference on World Wide Web. 963–966.
  21. Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836.
  22. An introduction to neural information retrieval. Foundations and Trends® in Information Retrieval 13, 1 (2018), 1–126.
  23. Yannis Papakonstantinou and Vasilis Vassalos. 1999. Query rewriting for semistructured data. ACM SIGMOD Record 28, 2 (1999), 455–466.
  24. J Ross Quinlan. 2014. C4. 5: programs for machine learning. Elsevier.
  25. Efficient and effective retrieval using selective pruning. In Proceedings of the sixth ACM international conference on Web search and data mining. 63–72.
  26. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533 (2022).
  27. Improving Text Embeddings with Large Language Models. arXiv preprint arXiv:2401.00368 (2023).
  28. Huichao Xue. 2020. Ranking user attributes for fast candidate selection in recommendation systems. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2869–2876.
  29. Mixed negative sampling for learning two-tower neural networks in recommendations. In Companion Proceedings of the Web Conference 2020. 441–447.
  30. Sampling-bias-corrected neural modeling for large corpus item recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems. 269–277.
  31. Justin Zobel and Alistair Moffat. 2006. Inverted files for text search engines. ACM computing surveys (CSUR) 38, 2 (2006), 6–es.

Summary

  • The paper introduces a graph-based framework that boosts promoted job matching performance, increasing budget utilization by 15% over baseline models.
  • The paper develops an embedding-based retrieval strategy for organic jobs that enhances personalization and user engagement using rule-based constraints.
  • The paper implements a GPU-powered exhaustive search system that efficiently combines KNN and term-matching for real-time, high-relevance job recommendations.

Learning to Retrieve for Job Matching: A Summary

The paper "Learning to Retrieve for Job Matching" addresses the complex process of enhancing job search and recommendation systems on platforms like LinkedIn through advanced retrieval methodologies. Web-scale search systems typically rely on a two-step paradigm comprising retrieval and ranking. The authors focus on the retrieval phase, specifically improving it through learning-to-retrieve technologies, significantly benefiting both promoted and organic job channels.

Methodological Advances

  1. Graph-Based Retrieval for Promoted Jobs: The paper introduces a graph-based framework to better evaluate candidate qualifications in the promoted job segment. By harnessing confirmed hire data, the authors construct a graph to identify optimal candidate-job pairings. This approach involves learning "links" between seeker and job segments, effectively serving as targeting rules. Such a graph is designed for interpretability, enabling straightforward adjustments to tailor the dynamic equilibrium of job liquidity and qualification quality.
  2. Embedding-Based Retrieval (EBR) for Organic Jobs: In organic channels, the core objective is enhancing user engagement by effectively personalizing job listings. Employing an Embedding-Based Retrieval system, the authors optimize the retrieval process using embeddings trained on personalized data. Despite the semantic strengths of EBR, precision and alignment with seeker profiles are ensured through rule-based constraints derived from categorizing member feedback.
  3. GPU-Based Exhaustive Search System: A notable development in the paper is the introduction of a GPU-based exhaustive search system that adeptly balances KNN and term-matching operations. This system marks a departure from traditional inverted index methods, leveraging GPU capabilities for efficient real-time processing of large-scale document pools.

Experimental Findings

In the promoted pipeline, a systematic evaluation shows that the graph-based method effectively boosts budget utilization metrics by 15% compared to baseline models. For organic pipelines, significant improvements in engagement metrics, including job applications and click-through rates, underscore the effectiveness of the EBR strategy enhanced with constraints ensuring qualification compliance. The implementation of a hybrid TBR + EBR system on GPUs results in low latency while preserving high relevance, as evidenced by substantial metric improvements in live product deployments.

Implications and Future Directions

The innovations presented in this paper offer significant implications for the design and scalability of job matching systems on large professional networks. The utilization of graph-based targeting for promoted jobs and embedding-driven retrieval for organic jobs exemplifies a tailored approach to differing objectives within the same ecosystem. Furthermore, the efficient GPU-based system suggests potential for broader applications in real-time search and retrieval tasks.

Looking ahead, future work may delve into the integration of LLMs fine-tuned for retrieval tasks to enhance semantic understanding and expand multilingual capabilities. Additionally, the refinement of hybrid architectures and further exploration of curriculum learning techniques in training retrieval models could drive more nuanced personalization while maintaining efficiency and interpretability. The paper's proposed strategies serve as foundational work for leveraging machine learning in enhancing candidate retrieval processes, fostering advancements in the domain of job recommendation systems.

HackerNews