ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution (2401.11356v3)
Abstract: Lexical Substitution discovers appropriate substitutes for a given target word in a context sentence. However, the task fails to consider substitutes that are of equal or higher proficiency than the target, an aspect that could be beneficial for language learners looking to improve their writing. To bridge this gap, we propose a new task, language proficiency-oriented lexical substitution. We also introduce ProLex, a novel benchmark designed to assess systems' ability to generate not only appropriate substitutes but also substitutes that demonstrate better language proficiency. Besides the benchmark, we propose models that can automatically perform the new task. We show that our best model, a Llama2-13B model fine-tuned with task-specific synthetic data, outperforms ChatGPT by an average of 3.2% in F-score and achieves comparable results with GPT-4 on ProLex.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
- Toefl11: A corpus of non-native english. ETS Research Report Series, 2013(2):i–15.
- Frank Boers and Stuart Webb. 2018. Teaching and learning collocation in adult second and foreign language learning. Language Teaching, 51(1):77–89.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Mitigating exposure bias in grammatical error correction with data augmentation and reweighting. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2115–2127.
- CathovenAI. 2023. Cefr checker (version 1.1.0) [web app]. ado language hub.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023).
- Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37–46.
- Daniel Dahlmeier and Hwee Tou Ng. 2012. Better evaluation for grammatical error correction. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 568–572, Montréal, Canada. Association for Computational Linguistics.
- Mark Davies. 2010. The corpus of contemporary american english as the first reliable monitor corpus of english. Literary and linguistic computing, 25(4):447–464.
- May Fan. 2000. How big is the gap and how to narrow it? an investigation into the active and passive vocabulary knowledge of l2 learners. Relc journal, 31(2):105–119.
- Na Fan. 2020. Strategy use in second language vocabulary learning and its relationships with the breadth and depth of vocabulary knowledge: A structural equation modeling study. Frontiers in psychology, 11:752.
- Christina Gitsaki. 1999. Second language lexical acquisition: A study of the development of collocational knowledge.
- Melanie C González. 2017. The contribution of lexical diversity to college-level writing. TESOL Journal, 8(4):899–919.
- Yongqi Gu and Robert Keith Johnson. 1996. Vocabulary learning strategies and language learning outcomes. Language learning, 46(4):643–679.
- Learning a lexical simplifier using wikipedia. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 458–463.
- Peter Howarth. 1998. Phraseology and second language proficiency. Applied linguistics, 19(1):24–44.
- Vocabulary knowledge and vocabulary use in second language writing. TESOL Journal, 7(3):700–715.
- What substitutes tell us-analysis of an “all-words” lexical substitution corpus. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 540–549.
- Genesis: a generative approach to substitutes in context. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10810–10823.
- Swords: A benchmark for lexical substitution with improved data coverage and quality. arXiv preprint arXiv:2106.04102.
- Exploring lexical bundles in low proficiency level l2 learners’ english writing: an ets corpus study. Applied Linguistics Review, 14(4):847–873.
- Diana McCarthy. 2002. Lexical substitution as a task for wsd evaluation. In Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions, pages 89–115.
- Diana McCarthy and Roberto Navigli. 2007. Semeval-2007 task 10: English lexical substitution task. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007), pages 48–53.
- Council of Europe Education Committee Modern Languages Division Council for Cultural Co-operation. 2001. Common European framework of reference for languages: Learning, teaching, assessment. Cambridge University Press.
- Gector–grammatical error correction: tag, not rewrite. arXiv preprint arXiv:2005.12592.
- OpenAI. 2023. Gpt-4 technical report.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Parals: Lexical substitution via pretrained paraphraser. arXiv preprint arXiv:2305.08146.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- A simple recipe for multilingual grammatical error correction. arXiv preprint arXiv:2106.03830.
- Geneva Smitherman and Victor Villanueva. 2003. Language diversity in the classroom: From intention to practice. SIU Press.
- What affects second language vocabulary learning? evidence from multivariate analysis. In Frontiers in Education, volume 8, page 1210640. Frontiers.
- Ensembling and knowledge distilling of large sequence taggers for grammatical error correction. arXiv preprint arXiv:2203.13064.
- Contextualizing semantic representations using syntactically enriched vector models. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 948–957.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Unsupervised lexical substitution with decontextualised embeddings. arXiv preprint arXiv:2209.08236.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
- Tracing text provenance via context-aware lexical substitution. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11613–11621.
- Lm-critic: Language models for unsupervised grammatical error correction. arXiv preprint arXiv:2109.06822.
- Guoxing Yu. 2010. Lexical diversity in writing and speaking task performances. Applied linguistics, 31(2):236–259.
- ErAConD: Error annotated conversational dialog dataset for grammatical error correction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 76–84, Seattle, United States. Association for Computational Linguistics.
- Judging llm-as-a-judge with mt-bench and chatbot arena.
- Bert-based lexical substitution. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3368–3373.