Learning to Retrieve for Job Matching (2402.13435v1)
Abstract: Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we discuss applying learning-to-retrieve technology to enhance LinkedIns job search and recommendation systems. In the realm of promoted jobs, the key objective is to improve the quality of applicants, thereby delivering value to recruiter customers. To achieve this, we leverage confirmed hire data to construct a graph that evaluates a seeker's qualification for a job, and utilize learned links for retrieval. Our learned model is easy to explain, debug, and adjust. On the other hand, the focus for organic jobs is to optimize seeker engagement. We accomplished this by training embeddings for personalized retrieval, fortified by a set of rules derived from the categorization of member feedback. In addition to a solution based on a conventional inverted index, we developed an on-GPU solution capable of supporting both KNN and term matching efficiently.
- Learning to match jobs with resumes from sparse interaction data using multi-view co-teaching network. In Proceedings of the 29th ACM CIKM. 65–74.
- CaSMoS: A framework for learning candidate selection models over structured queries and documents. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 441–450.
- Efficient query evaluation using a two-level retrieval process. In Proceedings of the twelfth international conference on Information and knowledge management. 426–434.
- Introduction to algorithms. MIT press.
- Corné De Ruijt and Sandjai Bhulai. 2021. Job recommender systems: A review. arXiv preprint arXiv:2111.13576 (2021).
- Efthimis N Efthimiadis. 1996. Query Expansion. Annual review of information science and technology (ARIST) 31 (1996), 121–87.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
- Embedding-based retrieval in facebook search. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2553–2561.
- Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117–128.
- Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547.
- Personalized job recommendation system at linkedin: Practical challenges and lessons learned. In Proceedings of the 11th RecSys. 346–347.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114, 13 (2017), 3521–3526.
- Semantic matching in search. Foundations and Trends® in Information Retrieval 7, 5 (2014), 343–469.
- How to get them a dream job? Entity-aware features for personalized job search ranking. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 501–510.
- Ping Li and Xiaoyun Li. 2023. OPORP: One permutation+ one random projection. arXiv preprint arXiv:2302.03505 (2023).
- Embedding-based product retrieval in taobao search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3181–3189.
- Deep job understanding at linkedin. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2145–2148.
- Budget-split testing: A trustworthy and powerful approach to marketplace A/B testing. https://www.linkedin.com/blog/engineering/infrastructure/budget-split-testing
- Que2Search: fast and accurate query and document understanding for search at Facebook. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3376–3384.
- A recommender system for job seeking and recruiting website. In Proceedings of the 22nd International Conference on World Wide Web. 963–966.
- Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836.
- An introduction to neural information retrieval. Foundations and Trends® in Information Retrieval 13, 1 (2018), 1–126.
- Yannis Papakonstantinou and Vasilis Vassalos. 1999. Query rewriting for semistructured data. ACM SIGMOD Record 28, 2 (1999), 455–466.
- J Ross Quinlan. 2014. C4. 5: programs for machine learning. Elsevier.
- Efficient and effective retrieval using selective pruning. In Proceedings of the sixth ACM international conference on Web search and data mining. 63–72.
- Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533 (2022).
- Improving Text Embeddings with Large Language Models. arXiv preprint arXiv:2401.00368 (2023).
- Huichao Xue. 2020. Ranking user attributes for fast candidate selection in recommendation systems. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2869–2876.
- Mixed negative sampling for learning two-tower neural networks in recommendations. In Companion Proceedings of the Web Conference 2020. 441–447.
- Sampling-bias-corrected neural modeling for large corpus item recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems. 269–277.
- Justin Zobel and Alistair Moffat. 2006. Inverted files for text search engines. ACM computing surveys (CSUR) 38, 2 (2006), 6–es.