Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Models as Batteries-Included Zero-Shot ESCO Skills Matchers (2307.03539v1)

Published 7 Jul 2023 in cs.CL and cs.AI

Abstract: Understanding labour market dynamics requires accurately identifying the skills required for and possessed by the workforce. Automation techniques are increasingly being developed to support this effort. However, automatically extracting skills from job postings is challenging due to the vast number of existing skills. The ESCO (European Skills, Competences, Qualifications and Occupations) framework provides a useful reference, listing over 13,000 individual skills. However, skills extraction remains difficult and accurately matching job posts to the ESCO taxonomy is an open problem. In this work, we propose an end-to-end zero-shot system for skills extraction from job descriptions based on LLMs. We generate synthetic training data for the entirety of ESCO skills and train a classifier to extract skill mentions from job posts. We also employ a similarity retriever to generate skill candidates which are then re-ranked using a second LLM. Using synthetic data achieves an RP@10 score 10 points higher than previous distant supervision approaches. Adding GPT-4 re-ranking improves RP@10 by over 22 points over previous methods. We also show that Framing the task as mock programming when prompting the LLM can lead to better performance than natural language prompts, especially with weaker LLMs. We demonstrate the potential of integrating LLMs at both ends of skills matching pipelines. Our approach requires no human annotations and achieve extremely promising results on skills extraction against ESCO.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. The evolution of is job skills: A content analysis of is job advertisements from 1970 to 1990, MIS quarterly (1995) 1–27.
  2. V. World Economic Forum, The future of jobs report 2020, WEF Reports (2020).
  3. E. Brumberger, C. Lauer, The evolution of technical communication: An analysis of industry job postings, Technical Communication 62 (2015) 224–243.
  4. Esco: Boosting job matching in europe with semantic interoperability, Computer 47 (2014) 57–64.
  5. Escoxlm-r: Multilingual taxonomy-driven pre-training for the job market domain, arXiv preprint arXiv:2305.12092 (2023).
  6. Retrieving skills from job descriptions: A language model based extreme multi-label classification framework, in: Proceedings of the 28th international conference on computational linguistics, 2020, pp. 5832–5842.
  7. Skillspan: Hard and soft skill extraction from english job postings, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022, pp. 4962–4984.
  8. Learning representations for soft skill matching, in: Analysis of Images, Social Networks and Texts: 7th International Conference, AIST 2018, Moscow, Russia, July 5–7, 2018, Revised Selected Papers 7, Springer, 2018, pp. 141–152.
  9. Jobxmlc: Extreme multi-label classification of job skills with graph neural networks, in: Findings of the Association for Computational Linguistics: EACL 2023, 2023, pp. 2136–2146.
  10. K. F. F. Jiechieu, N. Tsopze, Skills prediction based on multi-label resume classification using cnn with model predictions explanation, Neural Computing and Applications 33 (2021) 5069–5087.
  11. " fijo": a french insurance soft skill detection dataset, arXiv preprint arXiv:2204.05208 (2022).
  12. Kompetencer: Fine-grained skill classification in danish job postings via distant supervision and transfer learning, arXiv preprint arXiv:2205.01381 (2022).
  13. A survey of large language models, arXiv preprint arXiv:2303.18223 (2023).
  14. Scaling instruction-finetuned language models, arXiv preprint arXiv:2210.11416 (2022).
  15. Training language models to follow instructions with human feedback, arXiv preprint arXiv:2203.02155 (2022).
  16. Language models of code are few-shot commonsense learners, arXiv preprint arXiv:2210.07128 (2022).
  17. Documenting large webtext corpora: A case study on the colossal clean crawled corpus, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 1286–1305.
  18. Ul2: Unifying language learning paradigms, in: The Eleventh International Conference on Learning Representations, 2023.
  19. P. Cunningham, S. J. Delany, k-nearest neighbour classifiers-a tutorial, ACM computing surveys (CSUR) 54 (2021) 1–25.
  20. T. B. Brown, et al., Language models are few-shot learners, NeurIPS 2020 (2020).
  21. Falcon-40B: an open large language model with state-of-the-art performance (2023).
  22. Llama: Open and efficient foundation language models, arXiv preprint arXiv:2302.13971 (2023).
  23. E. Saravia, Prompt Engineering Guide, https://github.com/dair-ai/Prompt-Engineering-Guide (2022).
  24. Evaluating large language models trained on code, arXiv preprint arXiv:2107.03374 (2021).
  25. Palm: Scaling language modeling with pathways, arXiv preprint arXiv:2204.02311 (2022).
  26. Large-scale multi-label text classification on eu legislation, arXiv preprint arXiv:1906.02192 (2019).
  27. Jobbert: Understanding job titles through skills, arXiv preprint arXiv:2109.09605 (2021).
  28. Orca: Progressive learning from complex explanation traces of gpt-4, arXiv preprint arXiv:2306.02707 (2023).
  29. Ensemble of exemplar-svms for object detection and beyond, in: 2011 International conference on computer vision, IEEE, 2011, pp. 89–96.
  30. Skills mismatch: Concepts, measurement and policy approaches, Journal of Economic Surveys 32 (2018) 985–1015.
  31. Scikit-learn: Machine learning in python, the Journal of machine Learning research 12 (2011) 2825–2830.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Benjamin Clavié (12 papers)
  2. Guillaume Soulié (3 papers)
Citations (9)