ConFit: Improving Resume-Job Matching using Data Augmentation and Contrastive Learning (2401.16349v1)
Abstract: A reliable resume-job matching system helps a company find suitable candidates from a pool of resumes, and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction records in resume-job datasets are sparse. Different from many prior work that use complex modeling techniques, we tackle this sparsity problem using data augmentations and a simple contrastive learning approach. ConFit first creates an augmented resume-job dataset by paraphrasing specific sections in a resume or a job post. Then, ConFit uses contrastive learning to further increase training samples from $B$ pairs per batch to $O(B2)$ per batch. We evaluate ConFit on two real-world datasets and find it outperforms prior methods (including BM25 and OpenAI text-ada-002) by up to 19% and 31% absolute in nDCG@10 for ranking jobs and ranking resumes, respectively.
- Learning to match jobs with resumes from sparse interaction data using multi-view co-teaching network.
- Domain adaptation for person-job fit with transferable deep global match network. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4810–4820, Hong Kong, China. Association for Computational Linguistics.
- Man is to computer programmer as woman is to homemaker? debiasing word embeddings.
- Dorian Brown. 2020. Rank-BM25: A Collection of BM25 Algorithms in Python.
- Understanding the origins of bias in word embeddings. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 803–811. PMLR.
- Gender bias in word embeddings: A comprehensive analysis of frequency, syntax, and semantics. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’22. ACM.
- Chatgpt to replace crowdsourcing of paraphrases for intent classification: Higher diversity and comparable model robustness.
- Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. ACM.
- A simple framework for contrastive learning of visual representations.
- Fairfil: Contrastive neural debiasing method for pretrained text encoders.
- Unsupervised cross-lingual representation learning at scale. CoRR, abs/1911.02116.
- Auggpt: Leveraging chatgpt for text data augmentation.
- Bert: Pre-training of deep bidirectional transformers for language understanding.
- Recommendation systems: Content-based filtering vs collaborative filtering. In 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), pages 1360–1365. IEEE.
- William Falcon and The PyTorch Lightning team. 2019. PyTorch Lightning.
- Debiasing pretrained text encoders by paying attention to paying attention. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9582–9602, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Learning dense representations for entity retrieval. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 528–537, Hong Kong, China. Association for Computational Linguistics.
- Résumatcher: A personalized résumé-job matching system. Expert Systems with Applications, 60:169–182.
- Auto-debias: Debiasing masked language models with automated biased prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1012–1023, Dublin, Ireland. Association for Computational Linguistics.
- Co-teaching: Robust training of deep neural networks with extremely noisy labels.
- Mansoor Iqbal. 2023. LinkedIn usage and revenue statistics (2023). a. Accessed: 2023-12-29.
- Unsupervised dense information retrieval with contrastive learning.
- Sophie Jentzsch and Cigdem Turan. 2022. Gender bias in BERT - measuring and analysing biases through sentiment rating in a realistic downstream classification task. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 184–199, Seattle, Washington. Association for Computational Linguistics.
- Learning effective representations for person-job fit by feature fusion.
- Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547.
- Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension.
- Resources for brewing beir: Reproducible reference models and an official leaderboard.
- Dense passage retrieval for open-domain question answering.
- Natural questions: a benchmark for question answering research. Transactions of the Association of Computational Linguistics.
- Roberta: A robustly optimized bert pretraining approach.
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization.
- Yuanhua Lv and ChengXiang Zhai. 2011. When documents are very long, bm25 fails! In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’11, page 1103–1104, New York, NY, USA. Association for Computing Machinery.
- Saket Maheshwary and Hemant Misra. 2018. Matching resumes to jobs via deep siamese network. Companion Proceedings of the The Web Conference 2018.
- Eran Malach and Shai Shalev-Shwartz. 2018. Decoupling ”when to update” from ”how to update”.
- On measuring social biases in sentence encoders. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 622–628, Minneapolis, Minnesota. Association for Computational Linguistics.
- Resume screening and ranking using convolutional neural network. In 2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS), pages 412–419.
- Mteb: Massive text embedding benchmark. arXiv preprint arXiv:2210.07316.
- MS MARCO: A human generated machine reading comprehension dataset. CoRR, abs/1611.09268.
- OpenAI. 2022a. New and improved embedding model.
- OpenAI. 2022b. OpenAI: Introducing ChatGPT.
- Keiron O’Shea and Ryan Nash. 2015. An introduction to convolutional neural networks.
- Enhancing person-job fit for talent recruitment: An ability-aware neural network approach. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’18, page 25–34, New York, NY, USA. Association for Computing Machinery.
- Learning transferable visual models from natural language supervision.
- Zero: Memory optimizations toward training trillion parameter models.
- Bpr: Bayesian personalized ranking from implicit feedback.
- Sima Rezaeipourfarsangi and Evangelos E. Milios. 2023. Ai-powered resume-job matching: A document ranking approach using deep neural networks. In Proceedings of the ACM Symposium on Document Engineering 2023, DocEng ’23, New York, NY, USA. Association for Computing Machinery.
- Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333–389.
- Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in NLP. Transactions of the Association for Computational Linguistics, 9:1408–1424.
- Exploring internal and external interactions for semi-structured multivariate attributes in job-resume matching. In International Journal of Intelligent Systems.
- Ralf C. Staudemeyer and Eric Rothstein Morris. 2019. Understanding lstm – a tutorial into long short-term memory recurrent neural networks.
- Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models.
- Improvements to bm25 and language models examined. In Proceedings of the 19th Australasian Document Computing Symposium, ADCS ’14, page 58–65, New York, NY, USA. Association for Computing Machinery.
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605.
- Attention is all you need.
- Text embeddings by weakly-supervised contrastive pre-training.
- SimLM: Pre-training with representation bottleneck for dense passage retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2244–2258, Toronto, Canada. Association for Computational Linguistics.
- Jason Wei and Kai Zou. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6383–6389, Hong Kong, China. Association for Computational Linguistics.
- Interview choice reveals your preference on the market: To improve job-resume matching through profiling memories. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, page 914–922, New York, NY, USA. Association for Computing Machinery.
- Modeling two-way selection preference for person-job fit. In RecSys.
- Fedpjf: federated contrastive learning for privacy-preserving person-job fit. Applied Intelligence, 53:27060 – 27071.
- Person-job fit: Adapting the right talent for the right job with joint representation learning.
- Xiao Yu (66 papers)
- Jinzhong Zhang (4 papers)
- Zhou Yu (206 papers)