Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Computational Job Market Analysis with Natural Language Processing (2404.18977v1)

Published 29 Apr 2024 in cs.CL

Abstract: [Abridged Abstract] Recent technological advances underscore labor market dynamics, yielding significant consequences for employment prospects and increasing job vacancy data across platforms and languages. Aggregating such data holds potential for valuable insights into labor market demands, new skills emergence, and facilitating job matching for various stakeholders. However, despite prevalent insights in the private sector, transparent language technology systems and data for this domain are lacking. This thesis investigates NLP technology for extracting relevant information from job descriptions, identifying challenges including scarcity of training data, lack of standardized annotation guidelines, and shortage of effective extraction methods from job ads. We frame the problem, obtaining annotated data, and introducing extraction methodologies. Our contributions include job description datasets, a de-identification dataset, and a novel active learning algorithm for efficient model training. We propose skill extraction using weak supervision, a taxonomy-aware pre-training methodology adapting multilingual LLMs to the job market domain, and a retrieval-augmented model leveraging multiple skill extraction datasets to enhance overall performance. Finally, we ground extracted information within a designated taxonomy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (299)
  1. Principal component analysis. Wiley interdisciplinary reviews: computational statistics, 2(4):433–459, 2010. URL https://pubs.rsc.org/en/content/articlehtml/2014/ay/c3ay41907j.
  2. The mitre identification scrubber toolkit: design, training, and assessment. International journal of medical informatics, 79(12):849–859, 2010.
  3. Accountability Act. Health insurance portability and accountability act of 1996. Public law, 104:191, 1996.
  4. Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1638–1649, Santa Fe, New Mexico, USA, 2018. Association for Computational Linguistics. URL https://aclanthology.org/C18-1139.
  5. Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72–78, Minneapolis, Minnesota, USA, June 2019a. Association for Computational Linguistics. doi: 10.18653/v1/W19-1909. URL https://aclanthology.org/W19-1909.
  6. Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72–78, Minneapolis, Minnesota, USA, 2019b. Association for Computational Linguistics. doi: 10.18653/v1/W19-1909. URL https://aclanthology.org/W19-1909.
  7. Skill requirements in job advertisements: A comparison of skill-categorization methods based on wage regressions. Information Processing & Management, 60(2):103185, 2023. URL https://www.sciencedirect.com/science/article/pii/S0306457322002862?casa_token=g-IWrRrn4vEAAAAA:3Qe7yyjupAwY1BgFCeIf-psXEx_7roe-kXZi36buA0BVZ6WZfCcJgkyP0pUWAtCL7upHSPz2HXV7.
  8. End-to-end bias mitigation in candidate recommender systems with fairness gates. 2022.
  9. A comparative study of methods for transductive transfer learning. In Seventh IEEE international conference on data mining workshops (ICDMW 2007), pages 77–82. IEEE, 2007.
  10. A call for more rigor in unsupervised cross-lingual learning. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7375–7388, Online, 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.658. URL https://aclanthology.org/2020.acl-main.658.
  11. The growth of low-skill service jobs and the polarization of the us labor market. American economic review, 103(5):1553–1597, 2013. URL https://www.aeaweb.org/articles?id=10.1257/aer.103.5.1553.
  12. The skill content of recent technological change: An empirical exploration. The Quarterly journal of economics, 118(4):1279–1333, 2003. URL https://academic.oup.com/qje/article-abstract/118/4/1279/1925105?login=false.
  13. ReFinED: An efficient zero-shot-capable approach to end-to-end entity linking. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, pages 209–220, Hybrid: Seattle, Washington + Online, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-industry.24. URL https://aclanthology.org/2022.naacl-industry.24.
  14. Labor market concentration. Journal of Human Resources, 57(S):S167–S199, 2022.
  15. Named entity recognition in Wikipedia. In Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web), pages 10–18, Suntec, Singapore, 2009. Association for Computational Linguistics. URL https://aclanthology.org/W09-3302.
  16. Expertise retrieval. Foundations and Trends in Information Retrieval, 6(2–3):127–256, 2012.
  17. SimCompass: Using deep learning word embeddings to assess cross-level similarity. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 560–565, Dublin, Ireland, August 2014. Association for Computational Linguistics. doi: 10.3115/v1/S14-2098. URL https://aclanthology.org/S14-2098.
  18. Can humans identify domains?, 2024.
  19. Evidence > intuition: Transferability estimation for encoder selection. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4218–4227, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.283.
  20. “FIJO”: a french insurance soft skill detection dataset. arXiv e-prints, pages arXiv–2204, 2022.
  21. SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China, 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1371. URL https://aclanthology.org/D19-1371.
  22. Longformer: The long-document transformer. ArXiv preprint, abs/2004.05150, 2020. URL https://arxiv.org/abs/2004.05150.
  23. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6:587–604, 2018. doi: 10.1162/tacl_a_00041. URL https://aclanthology.org/Q18-1041.
  24. NoSta-D named entity annotation for German: Guidelines and dataset. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 2524–2531, Reykjavik, Iceland, 2014. European Language Resources Association (ELRA). URL http://www.lrec-conf.org/proceedings/lrec2014/pdf/276_Paper.pdf.
  25. Crawling and preprocessing mailing lists at scale for dialog analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1151–1158, Online, 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.108. URL https://aclanthology.org/2020.acl-main.108.
  26. Adaptation approaches for nearest neighbor language models. ArXiv preprint, abs/2211.07828, 2022. URL https://arxiv.org/abs/2211.07828.
  27. Retrieving skills from job descriptions: A language model based extreme multi-label classification framework. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5832–5842, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.513. URL https://aclanthology.org/2020.coling-main.513.
  28. Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 440–447, Prague, Czech Republic, June 2007. Association for Computational Linguistics. URL https://aclanthology.org/P07-1056.
  29. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146, 2017. doi: 10.1162/tacl_a_00051. URL https://aclanthology.org/Q17-1010.
  30. Carlo Bonferroni. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8:3–62, 1936.
  31. Classifying online job advertisements through machine learning. Future Generation Computer Systems, 86:319–328, 2018.
  32. Mining labor market requirements using distributional semantic models and deep learning. In Witold Abramowicz and Rafael Corchuelo, editors, Business Information Systems - 22nd International Conference, BIS 2019, Seville, Spain, June 26-28, 2019, Proceedings, Part II, volume 354 of Lecture Notes in Business Information Processing, pages 177–190. Springer, 2019. doi: 10.1007/978-3-030-20482-2\_15. URL https://doi.org/10.1007/978-3-030-20482-2_15.
  33. Race against the machine: How the digital revolution is accelerating innovation, driving productivity, and irreversibly transforming employment and the economy. Brynjolfsson and McAfee, 2011.
  34. The second machine age: Work, progress, and prosperity in a time of brilliant technologies. WW Norton & Company, 2014. URL https://books.google.nl/books?hl=nl&lr=&id=WiKwAgAAQBAJ&oi=fnd&pg=PA1&dq=The+second+machine+age:+Work,+progress,+and+prosperity+in+a+time+of+brilliant+technologies&ots=4_-uUc0Acg&sig=rh-Nl7fDit4mmdb_yMiATI9MkZA.
  35. Swiss job market monitor: A rich source of demand-side micro data of the labour market. European Sociological Review, 2022.
  36. Wikipedia entities as rendezvous across languages: Grounding multilingual language models by predicting Wikipedia hyperlinks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3651–3661, Online, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.286. URL https://aclanthology.org/2021.naacl-main.286.
  37. Autoregressive entity retrieval. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=5k8F6UU39V.
  38. Skill requirements analysis for data analysts based on named entities recognition. In 2021 2nd International Conference on Big Data and Informatization Education (ICBDIE), pages 64–68, 2021. doi: 10.1109/ICBDIE52740.2021.00023.
  39. Multilingual autoregressive entity linking. Transactions of the Association for Computational Linguistics, 10:274–290, 2022.
  40. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th {{\{{USENIX}}\}} Security Symposium ({{\{{USENIX}}\}} Security 19), pages 267–284, 2019.
  41. Extracting training data from large language models. ArXiv preprint, abs/2012.07805, 2020. URL https://arxiv.org/abs/2012.07805.
  42. Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997.
  43. German’s next language model. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6788–6796, Barcelona, Spain (Online), 2020. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.598. URL https://aclanthology.org/2020.coling-main.598.
  44. Creating a live, public short message service corpus: the nus sms corpus. Language Resources and Evaluation, 47(2):299–335, 2013.
  45. Mariia Chernova. Occupational skills extraction with FinBERT. Master’s Thesis, 2020. URL https://www.theseus.fi/bitstream/handle/10024/348657/Mariia_Chernova_Master_Thesis_full.pdf?sequence=2.
  46. Rethinking embedding coupling in pre-trained language models. ArXiv preprint, abs/2010.12821, 2020. URL https://arxiv.org/abs/2010.12821.
  47. BAM! born-again multi-task networks for natural language understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5931–5937, Florence, Italy, 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1595. URL https://aclanthology.org/P19-1595.
  48. Large language models as batteries-included zero-shot esco skills matchers. ArXiv preprint, abs/2307.03539, 2023. URL https://arxiv.org/abs/2307.03539.
  49. Active learning with statistical models. Advances in neural information processing systems, 7, 1994.
  50. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online, 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.747. URL https://aclanthology.org/2020.acl-main.747.
  51. Constructing biological knowledge bases by extracting information from text sources. In ISMB, volume 1999, pages 77–86, 1999.
  52. Reducing labeling effort for structured prediction tasks. In AAAI, volume 5, pages 746–751, 2005.
  53. Committee-based sampling for training probabilistic classifiers. In Machine Learning Proceedings 1995, pages 150–157. Elsevier, 1995.
  54. Jobbert: Understanding job titles through skills. ArXiv preprint, abs/2109.09605, 2021. URL https://arxiv.org/abs/2109.09605.
  55. Design of negative sampling strategies for distantly supervised skill extraction. ArXiv preprint, abs/2209.05987, 2022. URL https://arxiv.org/abs/2209.05987.
  56. Extreme multi-label skill extraction training using large language models. ArXiv preprint, abs/2307.10778, 2023. URL https://arxiv.org/abs/2307.10778.
  57. Results of the WNUT2017 shared task on novel and emerging entity recognition. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 140–147, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/W17-4418. URL https://aclanthology.org/W17-4418.
  58. De-identification of patient notes with recurrent neural networks. Journal of the American Medical Informatics Association, 24(3):596–606, 2017.
  59. Mitigating demographic bias in ai-based resume filtering. In Adjunct publication of the 28th ACM conference on user modeling, adaptation and personalization, pages 268–275, 2020.
  60. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019a. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  61. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, 2019b. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  62. Deep dominance - how to properly compare deep neural models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2773–2785, Florence, Italy, 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1266. URL https://aclanthology.org/P19-1266.
  63. Federated Nearest Neighbor Machine Translation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=R1U5G2spbLd.
  64. De-identification of emails: Pseudonymizing privacy-sensitive data in a German email corpus. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 259–269, Varna, Bulgaria, 2019. INCOMA Ltd. doi: 10.26615/978-954-452-056-4_030. URL https://aclanthology.org/R19-1030.
  65. Active Learning for BERT: An Empirical Study. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7949–7962, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.638. URL https://aclanthology.org/2020.emnlp-main.638.
  66. Peter Elias. Occupational classification (isco-88): Concepts, methods, reliability, validity and cross-national comparability. Technical report, OECD Publishing, 1997.
  67. Gpts are gpts: An early look at the labor market impact potential of large language models. ArXiv preprint, abs/2303.10130, 2023. URL https://arxiv.org/abs/2303.10130.
  68. ESCO. Machine Learning Assisted Mapping of Multilingual Occupational Data to ESCO (Part 1), 2022. URL https://esco.ec.europa.eu/en/about-esco/data-science-and-esco/machine-learning-assisted-mapping-multilingual-occupational-data-esco-part-1.
  69. Kawin Ethayarajh. How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 55–65, Hong Kong, China, 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1006. URL https://aclanthology.org/D19-1006.
  70. European Commission. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46. Official Journal of the European Union (OJ), 59(1-88):294, 2016.
  71. European Commission. Industry 5.0: Towards more sustainable, resilient and human-centric industry, 2021. URL https://research-and-innovation.ec.europa.eu/news/all-research-and-innovation-news/industry-50-towards-more-sustainable-resilient-and-human-centric-industry-2021-01-07_en.
  72. Skillner: Mining and mapping soft skills from any text. Expert Systems with Applications, 184:115544, 2021. URL https://www.sciencedirect.com/science/article/pii/S0957417421009519?casa_token=r5CvNzj74-gAAAAA:CHr3DmfOze1nTt359q7WFNNHPSJhNUVYZ5qCxcZS-_a9els3VIHLkGTGkwi745_Rsn74za-BYsvE.
  73. Joseph L Fleiss. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378, 1971.
  74. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and psychological measurement, 33(3):613–619, 1973.
  75. Adversarial learning of privacy-preserving text representations for de-identification of medical records. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5829–5839, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1584. URL https://aclanthology.org/P19-1584.
  76. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1050–1059. JMLR.org, 2016. URL http://proceedings.mlr.press/v48/gal16.html.
  77. Deep joint entity disambiguation with local neural attention. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2619–2629, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1277. URL https://aclanthology.org/D17-1277.
  78. Representation degeneration problem in training natural language generation models. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=SkEYojRqtm.
  79. Skill requirements in big data: A content analysis of job advertisements. Journal of Computer Information Systems, 58(4):374–384, 2018.
  80. Deep active learning over the long tail. ArXiv preprint, abs/1711.00941, 2017. URL https://arxiv.org/abs/1711.00941.
  81. Graphlmi: A data driven system for exploring labor market information through graph databases. Multimedia Tools and Applications, pages 1–30, 2020.
  82. Discriminative active learning. ArXiv preprint, abs/1907.06347, 2019. URL https://arxiv.org/abs/1907.06347.
  83. Text zoning and classification for job advertisements in German, French and English. In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science, pages 83–93, Online, 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.nlpcss-1.10. URL https://aclanthology.org/2020.nlpcss-1.10.
  84. Fine-grained extraction and classification of skill requirements in German-speaking job ads. In Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS), pages 14–24, Abu Dhabi, UAE, November 2022a. Association for Computational Linguistics. URL https://aclanthology.org/2022.nlpcss-1.2.
  85. Evaluation of transfer learning and domain adaptation for analyzing german-speaking job advertisements. In Proceedings of the Language Resources and Evaluation Conference, pages 3892–3901, Marseille, France, 2022b. European Language Resources Association. URL https://aclanthology.org/2022.lrec-1.414.
  86. Singular value decomposition and least squares solutions. Linear algebra, 2:134–151, 1971.
  87. Deep learning. MIT press, 2016.
  88. JobXMLC: EXtreme multi-label classification of job skills with graph neural networks. In Findings of the Association for Computational Linguistics: EACL 2023, pages 2181–2191, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. URL https://aclanthology.org/2023.findings-eacl.163.
  89. Improving neural language models with a continuous cache. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=B184E5qee.
  90. Development of a benchmark corpus to support entity recognition in job descriptions. In Proceedings of the Language Resources and Evaluation Conference, pages 1201–1208, Marseille, France, 2022. European Language Resources Association. URL https://aclanthology.org/2022.lrec-1.128.
  91. Implicit skills extraction using document embedding and its use in job recommendation. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 13286–13293. AAAI Press, 2020. URL https://aaai.org/ojs/index.php/AAAI/article/view/7038.
  92. Evaluation of a deidentification (de-id) software engine to share pathology reports and clinical documents for research. American journal of clinical pathology, 121(2):176–186, 2004.
  93. Don’t stop pretraining: Adapt language models to domains and tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, Online, July 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.740. URL https://aclanthology.org/2020.acl-main.740.
  94. Don’t stop pretraining: Adapt language models to domains and tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, Online, 2020b. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.740. URL https://aclanthology.org/2020.acl-main.740.
  95. Retrieval augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 3929–3938. PMLR, 2020. URL http://proceedings.mlr.press/v119/guu20a.html.
  96. Unsupervised domain adaptation of contextualized embeddings for sequence labeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4238–4248, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1433. URL https://aclanthology.org/D19-1433.
  97. Crfs based de-identification of medical records. Journal of biomedical informatics, 58:S39–S46, 2015.
  98. BERT-MK: Integrating graph contextualized knowledge into pre-trained language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2281–2290, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.207. URL https://aclanthology.org/2020.findings-emnlp.207.
  99. Efficient nearest neighbor language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5703–5714, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.461. URL https://aclanthology.org/2021.emnlp-main.461.
  100. Learning entity representation for entity disambiguation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 30–34, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL https://aclanthology.org/P13-2006.
  101. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
  102. The concept of job vacancies in a dynamic theory of the labor market. In The measurement and interpretation of job vacancies, pages 73–110. NBER, 1966.
  103. Bayesian active learning for classification and preference learning. ArXiv preprint, abs/1112.5745, 2011. URL https://arxiv.org/abs/1112.5745.
  104. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 328–339, Melbourne, Australia, 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1031. URL https://aclanthology.org/P18-1031.
  105. XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 4411–4421. PMLR, 2020. URL http://proceedings.mlr.press/v119/hu20b.html.
  106. WhiteningBERT: An easy unsupervised sentence embedding approach. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 238–244, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-emnlp.23. URL https://aclanthology.org/2021.findings-emnlp.23.
  107. Distilling knowledge from reader to retriever for question answering. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=NTEz-6wysdb.
  108. Few-shot learning with retrieval augmented language models. ArXiv preprint, abs/2208.03299, 2022. URL https://arxiv.org/abs/2208.03299.
  109. Paul Jaccard. Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bull Soc Vaudoise Sci Nat, 37:241–272, 1901.
  110. Carotene: A job title classification system for the online recruitment domain. In 2015 IEEE First International Conference on Big Data Computing Service and Applications, pages 286–293. IEEE, 2015.
  111. Towards a job title classification system. ArXiv preprint, abs/1606.00917, 2016. URL https://arxiv.org/abs/1606.00917.
  112. Large-scale occupational skills normalization for online recruitment. In Satinder P. Singh and Shaul Markovitch, editors, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, pages 4627–4634. AAAI Press, 2017. URL http://aaai.org/ocs/index.php/IAAI/IAAI17/paper/view/14922.
  113. De-identification of privacy-related entities in job postings. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 210–221, Reykjavik, Iceland (Online), May 31–2 June 2021a. Linköping University Electronic Press, Sweden. URL https://aclanthology.org/2021.nodalida-main.21.
  114. De-identification of privacy-related entities in job postings. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 210–221, Reykjavik, Iceland (Online), 2021b. Linköping University Electronic Press, Sweden. URL https://aclanthology.org/2021.nodalida-main.21.
  115. Representation of job-skill in artificial intelligence with knowledge graph analysis. In 2018 IEEE Symposium on Product Compliance Engineering-Asia (ISPCE-CN), pages 1–6. IEEE, 2018.
  116. Towards robust k-nearest-neighbor machine translation. ArXiv preprint, abs/2210.08808, 2022a. URL https://arxiv.org/abs/2210.08808.
  117. Learning kernel-smoothed machine translation with retrieved examples. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7280–7290, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.579. URL https://aclanthology.org/2021.emnlp-main.579.
  118. Promptbert: Improving bert sentence embeddings with prompts. ArXiv preprint, abs/2201.04337, 2022b. URL https://arxiv.org/abs/2201.04337.
  119. De-identification of medical records using conditional random fields and long short-term memory networks. Journal of biomedical informatics, 75:S43–S53, 2017.
  120. Plug and play knowledge distillation for kNN-LM with external logits. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 463–469, Online only, 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.aacl-short.57.
  121. Deidentification of free-text medical records using pre-trained bidirectional transformers. In Proceedings of the ACM Conference on Health, Inference, and Learning, pages 214–221, 2020.
  122. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019. URL https://ieeexplore.ieee.org/abstract/document/8733051/.
  123. SpanBERT: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8:64–77, 2020a. doi: 10.1162/tacl_a_00300. URL https://aclanthology.org/2020.tacl-1.5.
  124. SpanBERT: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8:64–77, 2020b. doi: 10.1162/tacl_a_00300. URL https://aclanthology.org/2020.tacl-1.5.
  125. Mind your outliers! investigating the negative impact of outliers on active learning for visual question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7265–7281, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.564. URL https://aclanthology.org/2021.acl-long.564.
  126. Generalization through memorization: Nearest neighbor language models. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=HklBjCEKvH.
  127. Nearest neighbor machine translation. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=7wCBOfJ8hJM.
  128. A survey on skill identification from online job ads. IEEE Access, 9:118134–118153, 2021.
  129. A deep learning architecture for de-identification of patient notes: Implementation and evaluation. ArXiv preprint, abs/1810.01570, 2018. URL https://arxiv.org/abs/1810.01570.
  130. Yoon Kim. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, Doha, Qatar, October 2014. Association for Computational Linguistics. doi: 10.3115/v1/D14-1181. URL https://aclanthology.org/D14-1181.
  131. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
  132. Analyzing Language in Restricted Domains: Sublanguage Description and Processing. Lawrence Erlbaum Associates, 1986.
  133. A graph-based approach to skill extraction from text. In Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing, pages 79–87, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. URL https://aclanthology.org/W13-5011.
  134. Discriminated by an algorithm: a systematic review of discrimination and fairness by algorithmic decision-making in the context of hr recruitment and hr development. Business Research, 13(3):795–848, 2020.
  135. Philipp Koehn. Europarl: A parallel corpus for statistical machine translation. In Proceedings of Machine Translation Summit X: Papers, pages 79–86, Phuket, Thailand, 2005. URL https://aclanthology.org/2005.mtsummit-papers.11.
  136. AC Koivunen and AB Kostinski. The feasibility of data whitening to improve performance of weather radar. Journal of Applied Meteorology and Climatology, 38(6):741–749, 1999. URL https://digitalcommons.mtu.edu/cgi/viewcontent.cgi?article=1279&context=physics-fp.
  137. 75 languages, 1 model: Parsing Universal Dependencies universally. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2779–2795, Hong Kong, China, 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1279. URL https://aclanthology.org/D19-1279.
  138. Knowledge-driven unsupervised skills extraction for graph-based talent matching. ArXiv preprint, abs/10.1145, 2010. URL https://arxiv.org/abs/10.1145.
  139. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  140. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Carla E. Brodley and Andrea Pohoreckyj Danyluk, editors, Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28 - July 1, 2001, pages 282–289. Morgan Kaufmann, 2001.
  141. The measurement of observer agreement for categorical data. biometrics, pages 159–174, 1977.
  142. Industry 4.0. Business & information systems engineering, 6:239–242, 2014.
  143. From zero to hero: On the limitations of zero-shot language transfer with multilingual Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4483–4499, Online, 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.363. URL https://aclanthology.org/2020.emnlp-main.363.
  144. Esco: Boosting job matching in europe with semantic interoperability. Computer, 47(10):57–64, 2014.
  145. Deep learning. nature, 521(7553):436–444, 2015.
  146. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020a.
  147. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020b.
  148. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6086–6096, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1612. URL https://aclanthology.org/P19-1612.
  149. Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707–710. Soviet Union, 1966.
  150. Heterogeneous uncertainty sampling for supervised learning. In Machine learning proceedings 1994, pages 148–156. Elsevier, 1994.
  151. A sequential algorithm for training text classifiers. In SIGIR’94, pages 3–12. Springer, 1994.
  152. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online, July 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.703. URL https://aclanthology.org/2020.acl-main.703.
  153. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020b. URL https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html.
  154. Efficient one-pass end-to-end entity linking for questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6433–6441, Online, November 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.522. URL https://aclanthology.org/2020.emnlp-main.522.
  155. On the sentence embeddings from pre-trained language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9119–9130, Online, November 2020b. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.733. URL https://aclanthology.org/2020.emnlp-main.733.
  156. On the sentence embeddings from pre-trained language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9119–9130, Online, 2020c. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.733. URL https://aclanthology.org/2020.emnlp-main.733.
  157. Deep job understanding at linkedin. In Jimmy Huang, Yi Chang, Xueqi Cheng, Jaap Kamps, Vanessa Murdock, Ji-Rong Wen, and Yiqun Liu, editors, Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020, pages 2145–2148. ACM, 2020d. doi: 10.1145/3397271.3401403. URL https://doi.org/10.1145/3397271.3401403.
  158. Learning question classifiers. In COLING 2002: The 19th International Conference on Computational Linguistics, 2002. URL https://aclanthology.org/C02-1150.
  159. Domain specialization as the key to make large language models disruptive: A comprehensive survey. ArXiv, 2023.
  160. On cross-lingual retrieval with multilingual text encoders. Information Retrieval Journal, pages 1–35, 2022.
  161. Deep learning for extreme multi-label text classification. In Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de Vries, and Ryen W. White, editors, Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017, pages 115–124. ACM, 2017a. doi: 10.1145/3077136.3080834. URL https://doi.org/10.1145/3077136.3080834.
  162. Learning multi-graph neural network for data-driven job skill prediction. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021.
  163. Roberta: A robustly optimized bert pretraining approach. ArXiv preprint, abs/1907.11692, 2019. URL https://arxiv.org/abs/1907.11692.
  164. Automatic de-identification of electronic medical records using token-level and character-level conditional random fields. Journal of biomedical informatics, 58:S47–S52, 2015.
  165. De-identification of clinical notes via recurrent neural network and conditional random field. Journal of biomedical informatics, 75:S34–S42, 2017b.
  166. Zero-shot entity linking by reading entity descriptions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3449–3460, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1335. URL https://aclanthology.org/P19-1335.
  167. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.
  168. Practical obstacles to deploying active learning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 21–30, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1003. URL https://aclanthology.org/D19-1003.
  169. JobSkape: A framework for generating synthetic job postings to enhance skill matching. In Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024), pages 43–58, St. Julian’s, Malta, March 2024. Association for Computational Linguistics. URL https://aclanthology.org/2024.nlp4hr-1.4.
  170. Bridge the terminology gap between recruiters and candidates: A multilingual skills base built from social media and linked data. In Ravi Kumar, James Caverlee, and Hanghang Tong, editors, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016, San Francisco, CA, USA, August 18-21, 2016, pages 583–590. IEEE Computer Society, 2016. doi: 10.1109/ASONAM.2016.7752295. URL https://doi.org/10.1109/ASONAM.2016.7752295.
  171. Active learning by acquiring contrastive examples. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 650–663, 2021.
  172. CamemBERT: a tasty French language model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7203–7219, Online, July 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.645. URL https://aclanthology.org/2020.acl-main.645.
  173. CamemBERT: a tasty French language model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7203–7219, Online, 2020b. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.645. URL https://aclanthology.org/2020.acl-main.645.
  174. When is multitask learning effective? semantic sequence prediction under varying data conditions. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 44–53, Valencia, Spain, 2017. Association for Computational Linguistics. URL https://aclanthology.org/E17-1005.
  175. Efficient machine translation domain adaptation. In Proceedings of the 1st Workshop on Semiparametric Methods in NLP: Decoupling Logic from Knowledge, pages 23–29, Dublin, Ireland and Online, 2022a. Association for Computational Linguistics. doi: 10.18653/v1/2022.spanlp-1.3. URL https://aclanthology.org/2022.spanlp-1.3.
  176. Chunk-based nearest neighbor machine translation. ArXiv preprint, abs/2205.12230, 2022b. URL https://arxiv.org/abs/2205.12230.
  177. The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730, 2018.
  178. Umap: Uniform manifold approximation and projection. The Journal of Open Source Software, 3(29):861, 2018. URL https://www.theoj.org/joss-papers/joss.00861/10.21105.joss.00861.pdf.
  179. Improved de-identification of physician notes through integrative modeling of both public and private medical text. BMC medical informatics and decision making, 13(1):112, 2013.
  180. Quinn McNemar. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2):153–157, 1947. URL https://d1wqtxts1xzle7.cloudfront.net/48117997/bf0229599620160817-8553-1ubbzhr-libre.pdf?1471439562=&response-content-disposition=inline%3B+filename%3DNote_on_the_sampling_error_of_the_differ.pdf&Expires=1687321525&Signature=MndzUn74Uan2bisp02Fa2k~ycfrsKTpdonoDYT4oBZ~MamE6mB6qk9G34lM-pb6NEjNh2OawMt3CCoXVCkd3qHNYJNGvq5Hd6XtyaZdVDv5q4x6eXaPLvzZ~IB6TfiQRIcTriDzlk0uEw0Elo4CkWPQEBnEG7zpawqVvHG0Uo64Q1GCXGiqDDA8i71izRX7M6tImquqUcpGUPSPvBpaZfB1~ChQgEAHo7TIjH9E0EgW2WXtXWhwSItYvua3fDElrr2FdkHAKrRLZBsccY0OPbIZXYS3XDCyuc7hIQPUoB9H64QeObVlMguraV88ISxxLoPjLvzH0M8uW5nSgh0cTGQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA.
  181. Stephane M Meystre. De-identification of unstructured clinical data for patient privacy protection. In Medical Data Privacy Handbook, pages 697–716. Springer, 2015.
  182. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC medical research methodology, 10(1):70, 2010.
  183. Silo language models: Isolating legal risk in a nonparametric datastore. ArXiv preprint, abs/2308.04430, 2023a. URL https://arxiv.org/abs/2308.04430.
  184. Nonparametric masked language modeling. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2097–2118, Toronto, Canada, July 2023b. Association for Computational Linguistics. URL https://aclanthology.org/2023.findings-acl.132.
  185. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–1011, Suntec, Singapore, 2009. Association for Computational Linguistics. URL https://aclanthology.org/P09-1113.
  186. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022.
  187. Ethical considerations in ai-based recruitment. In 2019 IEEE International Symposium on Technology and Society (ISTAS), pages 1–7. IEEE, 2019.
  188. Hiroki Nakayama. seqeval: A python framework for sequence labeling evaluation, 2018. URL https://github.com/chakki-works/seqeval. Software available from https://github.com/chakki-works/seqeval.
  189. doccano: Text annotation tool for human, 2018. URL https://github.com/doccano/doccano. Software available from https://github.com/doccano/doccano.
  190. Automated de-identification of free-text medical records. BMC medical informatics and decision making, 8(1):32, 2008.
  191. BERTweet: A pre-trained language model for English tweets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 9–14, Online, October 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-demos.2. URL https://aclanthology.org/2020.emnlp-demos.2.
  192. BERTweet: A pre-trained language model for English tweets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 9–14, Online, 2020b. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-demos.2. URL https://aclanthology.org/2020.emnlp-demos.2.
  193. Rethinking skill extraction in the job market domain using large language models. In Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024), pages 27–42, St. Julian’s, Malta, March 2024. Association for Computational Linguistics. URL https://aclanthology.org/2024.nlp4hr-1.3.
  194. Universal Dependencies. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts, Valencia, Spain, 2017. Association for Computational Linguistics. URL https://aclanthology.org/E17-5001.
  195. Fine-grained entity typing for domain independent entity linking. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8576–8583, 2020.
  196. OpenAI. Chatgpt (march version), 2023. URL https://chat.openai.com/chat.
  197. Mining people analytics from stackoverflow job advertisements. In 2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pages 108–115. IEEE, 2017.
  198. Approaches of anonymisation of an sms corpus. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 77–88. Springer, 2013.
  199. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar, 2014. Association for Computational Linguistics. doi: 10.3115/v1/D14-1162. URL https://aclanthology.org/D14-1162.
  200. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana, 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1202. URL https://aclanthology.org/N18-1202.
  201. Knowledge enhanced contextual word representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 43–54, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1005. URL https://aclanthology.org/D19-1005.
  202. Wilhelm H Peterßen. Kleines methoden-lexikon. 2. aktualisierte aufl, 2001.
  203. KILT: a benchmark for knowledge intensive language tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2523–2544, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.200. URL https://aclanthology.org/2021.naacl-main.200.
  204. Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 412–418, Berlin, Germany, 2016. Association for Computational Linguistics. doi: 10.18653/v1/P16-2067. URL https://aclanthology.org/P16-2067.
  205. Improving language understanding by generative pre-training. 2018.
  206. Mitigating bias in algorithmic hiring: Evaluating claims and practices. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 469–481, 2020.
  207. Deeptype: Multilingual entity linking by neural type system evolution. In Sheila A. McIlraith and Kilian Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 5406–5413. AAAI Press, 2018. URL https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17148.
  208. Few-shot question answering by pretraining span selection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3066–3079, Online, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.239. URL https://aclanthology.org/2021.acl-long.239.
  209. Domain divergences: A survey and empirical analysis. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1830–1849, Online, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.147. URL https://aclanthology.org/2021.naacl-main.147.
  210. Neural unsupervised domain adaptation in NLP—A survey. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6838–6855, Barcelona, Spain (Online), 2020. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.603. URL https://aclanthology.org/2020.coling-main.603.
  211. Text chunking using transformation-based learning. In Third Workshop on Very Large Corpora, 1995. URL https://aclanthology.org/W95-0107.
  212. Handbook of technical and vocational education and training research, volume 49. Springer, 2008.
  213. Why comparing single performance scores does not allow to draw conclusions about machine learnin. ArXiv preprint, abs/1803.09578, 2018. URL https://arxiv.org/abs/1803.09578.
  214. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1410. URL https://aclanthology.org/D19-1410.
  215. To transfer or not to transfer. In NIPS 2005 workshop on transfer learning, volume 898, pages 1–4, 2005.
  216. Sebastian Ruder. An overview of multi-task learning in deep neural networks. ArXiv preprint, abs/1706.05098, 2017. URL https://arxiv.org/abs/1706.05098.
  217. What does it mean to’solve’the problem of discrimination in hiring? social, technical and legal perspectives from the uk on automated hiring systems. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 458–468, 2020.
  218. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207, 2021.
  219. Danielle Saunders. Domain adaptation and multi-domain adaptation for neural machine translation: A survey. Journal of Artificial Intelligence Research, 75:351–424, 2022.
  220. Learning representations for soft skill matching. In International Conference on Analysis of Images, Social Networks and Texts, pages 141–152, 2018.
  221. Klaus Schwab. The fourth industrial revolution. Currency, 2017.
  222. Active learning for convolutional neural networks: A core-set approach. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=H1aIuk-RW.
  223. Deep learning-based computational job market analysis: A survey on skill extraction and classification from job postings. In Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024), pages 1–15, St. Julian’s, Malta, March 2024. Association for Computational Linguistics. URL https://aclanthology.org/2024.nlp4hr-1.1.
  224. Burr Settles. Active learning literature survey. 2009.
  225. Burr Settles. Active learning. Synthesis lectures on artificial intelligence and machine learning, 6(1):1–114, 2012.
  226. Claude E Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
  227. Deep active learning for named entity recognition. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=ry018WZAZ.
  228. Salience and market-aware skill extraction for job targeting. In Rajesh Gupta, Yan Liu, Jiliang Tang, and B. Aditya Prakash, editors, KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, pages 2871–2879. ACM, 2020. URL https://dl.acm.org/doi/10.1145/3394486.3403338.
  229. Nearest neighbor zero-shot inference. ArXiv preprint, abs/2205.13792, 2022. URL https://arxiv.org/abs/2205.13792.
  230. Ontology-guided job market demand analysis: a cross-sectional study for the data science field. In Proceedings of the 13th International Conference on Semantic Systems, pages 25–32, 2017.
  231. Deep Bayesian active learning for natural language processing: Results of a large-scale empirical study. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2904–2909, Brussels, Belgium, 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1318. URL https://aclanthology.org/D18-1318.
  232. Large language models encode clinical knowledge. Nature, pages 1–9, 2023.
  233. Syntax-based skill extractor for job advertisements. In 2019 6th Swiss Conference on Data Science (SDS), pages 80–81. IEEE, 2019.
  234. Skill extraction for domain-specific text retrieval in a job-matching platform. In International Conference of the Cross-Language Evaluation Forum for European Languages, pages 116–128. Springer, 2021.
  235. Portuguese named entity recognition using bert-crf. ArXiv preprint, abs/1909.10649, 2019. URL https://arxiv.org/abs/1909.10649.
  236. The 4th industrial revolution–its impact on vocational skills. Journal of Education and Work, 34(1):29–52, 2021.
  237. WeaNF”:” weak supervision with normalizing flows. In Proceedings of the 7th Workshop on Representation Learning for NLP, pages 269–279, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.repl4nlp-1.27. URL https://aclanthology.org/2022.repl4nlp-1.27.
  238. Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/uthealth corpus. Journal of biomedical informatics, 58:S20–S29, 2015a.
  239. Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus. Journal of Biomedical Informatics, 58:S20–S29, 2015b. ISSN 1532-0464. doi: 10.1016/j.jbi.2015.07.020. URL http://www.sciencedirect.com/science/article/pii/S1532046415001823.
  240. Whitening sentence representations for better semantics and faster retrieval. ArXiv preprint, abs/2103.15316, 2021. URL https://arxiv.org/abs/2103.15316.
  241. Dataset cartography: Mapping and diagnosing datasets with training dynamics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9275–9293, Online, November 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.746. URL https://aclanthology.org/2020.emnlp-main.746.
  242. Dataset cartography: Mapping and diagnosing datasets with training dynamics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9275–9293, Online, 2020b. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.746. URL https://aclanthology.org/2020.emnlp-main.746.
  243. State-of-the-art anonymization of medical records using an iterative machine learning framework. Journal of the American Medical Informatics Association, 14(5):574–580, 2007.
  244. Code and named entity recognition in StackOverflow. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4913–4926, Online, 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.443. URL https://aclanthology.org/2020.acl-main.443.
  245. Dataops for societal intelligence: a data pipeline for labor market skills extraction and matching. In 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), pages 391–394. IEEE, 2020.
  246. Erik F. Tjong Kim Sang. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002), 2002a. URL https://aclanthology.org/W02-2024.
  247. Erik F. Tjong Kim Sang. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002), 2002b. URL https://aclanthology.org/W02-2024.
  248. Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pages 142–147, 2003. URL https://aclanthology.org/W03-0419.
  249. Regularized training of nearest neighbor language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 25–30, Hybrid: Seattle, Washington + Online, 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-srw.4. URL https://aclanthology.org/2022.naacl-srw.4.
  250. Collection of a corpus of Dutch SMS. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pages 2268–2273, Istanbul, Turkey, 2012. European Language Resources Association (ELRA). URL http://www.lrec-conf.org/proceedings/lrec2012/pdf/537_Paper.pdf.
  251. Comparing rule-based, feature-based and deep neural methods for de-identification of dutch medical records. In Eickhoff, C.(ed.), Health Search and Data Mining Workshop: Proceedings of the ACM WSDM 2020 Health Search and Data Mining Workshop co-located with the 13th ACM International WSDM Conference (WSDM 2020) Houston, Texas, USA, February 3, 2020, pages 3–11. [Sl]: CEUR, 2020.
  252. Dennis Ulmer. deep-significance: Easy and Better Significance Testing for Deep Neural Networks, 2021. URL https://doi.org/10.5281/zenodo.4638709. https://github.com/Kaleidophon/deep-significance.
  253. Experimental standards for deep learning in natural language processing research. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2673–2692, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.findings-emnlp.196.
  254. From masked language modeling to translation: Non-English auxiliary tasks improve zero-shot spoken language understanding. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2479–2497, Online, 2021a. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.197. URL https://aclanthology.org/2021.naacl-main.197.
  255. Massive choice, ample tasks (MaChAmp): A toolkit for multi-task learning in NLP. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 176–197, Online, April 2021b. Association for Computational Linguistics. doi: 10.18653/v1/2021.eacl-demos.22. URL https://aclanthology.org/2021.eacl-demos.22.
  256. Massive choice, ample tasks (MaChAmp): A toolkit for multi-task learning in NLP. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 176–197, Online, 2021c. Association for Computational Linguistics. URL https://aclanthology.org/2021.eacl-demos.22.
  257. Improving fairness assessments with synthetic data: a practical use case with a recommender system for human resources. 2022.
  258. C Van Rijsbergen. Information retrieval: theory and practice. In Proceedings of the Joint IBM/University of Newcastle upon Tyne Seminar on Data Base Systems, volume 79, 1979.
  259. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
  260. Efficient cluster-based k𝑘kitalic_k-nearest-neighbor machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2175–2187, Dublin, Ireland, 2022a. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.154. URL https://aclanthology.org/2022.acl-long.154.
  261. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1405–1418, Online, 2021a. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-acl.121. URL https://aclanthology.org/2021.findings-acl.121.
  262. KNN-NER: Named entity recognition with nearest neighbor search. ArXiv preprint, abs/2203.17103, 2022b. URL https://arxiv.org/abs/2203.17103.
  263. KEPLER: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics, 9:176–194, 2021b. doi: 10.1162/tacl_a_00360. URL https://aclanthology.org/2021.tacl-1.11.
  264. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=gEZrGCozdqR.
  265. Building and auditing fair algorithms: A case study in candidate screening. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 666–677, 2021.
  266. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, October 2020a. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-demos.6. URL https://aclanthology.org/2020.emnlp-demos.6.
  267. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, 2020b. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-demos.6. URL https://aclanthology.org/2020.emnlp-demos.6.
  268. Scalable zero-shot entity linking with dense entity retrieval. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6397–6407, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.519. URL https://aclanthology.org/2020.emnlp-main.519.
  269. Bloomberggpt: A large language model for finance. ArXiv, 2023.
  270. Google’s neural machine translation system: Bridging the gap between human and machine translation. ArXiv preprint, abs/1609.08144, 2016. URL https://arxiv.org/abs/1609.08144.
  271. Why do nearest neighbor language models work? ArXiv preprint, abs/2301.02828, 2023. URL https://arxiv.org/abs/2301.02828.
  272. ConSERT: A contrastive framework for self-supervised sentence representation transfer. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5065–5075, Online, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.393. URL https://aclanthology.org/2021.acl-long.393.
  273. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1480–1489, San Diego, California, 2016. Association for Computational Linguistics. doi: 10.18653/v1/N16-1174. URL https://aclanthology.org/N16-1174.
  274. LinkBERT: Pretraining language models with document links. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8003–8016, Dublin, Ireland, May 2022a. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.551. URL https://aclanthology.org/2022.acl-long.551.
  275. Linkbert: Pretraining language models with document links. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8003–8016, 2022b.
  276. Efficient nearest neighbor emotion classification with BERT-whitening. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4738–4745, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.312.
  277. Adaptive semiparametric language models. Transactions of the Association for Computational Linguistics, 9:362–373, 2021. doi: 10.1162/tacl_a_00371. URL https://aclanthology.org/2021.tacl-1.22.
  278. Dict-BERT: Enhancing language model pre-training with dictionary. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1907–1918, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-acl.150. URL https://aclanthology.org/2022.findings-acl.150.
  279. Tempel: Linking dynamically evolving and newly emerging entities. Advances in Neural Information Processing Systems, 35:1850–1866, 2022.
  280. Cartography active learning. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 395–406, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-emnlp.36. URL https://aclanthology.org/2021.findings-emnlp.36.
  281. SkillSpan: Hard and soft skill extraction from English job postings. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4962–4984, Seattle, United States, July 2022a. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.366. URL https://aclanthology.org/2022.naacl-main.366.
  282. Kompetencer: Fine-grained skill classification in Danish job postings via distant supervision and transfer learning. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 436–447, Marseille, France, June 2022b. European Language Resources Association. URL https://aclanthology.org/2022.lrec-1.46.
  283. Kompetencer: Fine-grained skill classification in danish job postings via distant supervision and transfer learning. In Proceedings of the Language Resources and Evaluation Conference, pages 436–447, Marseille, France, 2022c. European Language Resources Association. URL https://aclanthology.org/2022.lrec-1.46.
  284. Skill extraction from job postings using weak supervision. In Proceedings of RecSysHR’22. RecSysHR’22, 2022d. URL https://ceur-ws.org/Vol-3218/RecSysHR2022-paper_10.pdf.
  285. ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11871–11890, Toronto, Canada, 2023a. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.662. URL https://aclanthology.org/2023.acl-long.662.
  286. ESCOXLM-R: Multilingual taxonomy-driven pre-training for the job market domain. ArXiv preprint, abs/2305.12092, 2023b. URL https://arxiv.org/abs/2305.12092.
  287. NNOSE: Nearest neighbor occupational skill extraction. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 589–608, St. Julian’s, Malta, March 2024a. Association for Computational Linguistics. URL https://aclanthology.org/2024.eacl-long.35.
  288. Entity linking in the job market domain. In Findings of the Association for Computational Linguistics: EACL 2024, pages 410–419, St. Julian’s, Malta, March 2024b. Association for Computational Linguistics. URL https://aclanthology.org/2024.findings-eacl.28.
  289. Character-level convolutional networks for text classification. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 649–657, 2015. URL https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html.
  290. ERNIE: Enhanced language representation with informative entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1441–1451, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1139. URL https://aclanthology.org/P19-1139.
  291. SKILL: A system for skill identification and normalization. In Blai Bonet and Sven Koenig, editors, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pages 4012–4018. AAAI Press, 2015. URL http://www.aaai.org/ocs/index.php/IAAI/IAAI15/paper/view/9363.
  292. Areas of vocational education research. Springer, 2014.
  293. Fedor Zhdanov. Diverse mini-batch active learning. ArXiv preprint, abs/1901.05954, 2019. URL https://arxiv.org/abs/1901.05954.
  294. Adaptive nearest neighbor machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 368–374, Online, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-short.47. URL https://aclanthology.org/2021.acl-short.47.
  295. Weaker than you think: A critical look at weakly supervised learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14229–14253, Toronto, Canada, July 2023a. Association for Computational Linguistics. URL https://aclanthology.org/2023.acl-long.796.
  296. Learn to not link: Exploring NIL prediction in entity linking. In Findings of the Association for Computational Linguistics: ACL 2023, pages 10846–10860, Toronto, Canada, July 2023b. Association for Computational Linguistics. URL https://aclanthology.org/2023.findings-acl.690.
  297. What knowledge is needed? towards explainable memory for knn-mt domain adaptation. ArXiv preprint, abs/2211.04052, 2022. URL https://arxiv.org/abs/2211.04052.
  298. knn-box: A unified framework for nearest neighbor generation. ArXiv preprint, abs/2302.13574, 2023c. URL https://arxiv.org/abs/2302.13574.
  299. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pages 19–27. IEEE Computer Society, 2015. doi: 10.1109/ICCV.2015.11. URL https://doi.org/10.1109/ICCV.2015.11.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Mike Zhang (33 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com