ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution (2401.11356v3)

Published 21 Jan 2024 in cs.CL

Abstract: Lexical Substitution discovers appropriate substitutes for a given target word in a context sentence. However, the task fails to consider substitutes that are of equal or higher proficiency than the target, an aspect that could be beneficial for language learners looking to improve their writing. To bridge this gap, we propose a new task, language proficiency-oriented lexical substitution. We also introduce ProLex, a novel benchmark designed to assess systems' ability to generate not only appropriate substitutes but also substitutes that demonstrate better language proficiency. Besides the benchmark, we propose models that can automatically perform the new task. We show that our best model, a Llama2-13B model fine-tuned with task-specific synthetic data, outperforms ChatGPT by an average of 3.2% in F-score and achieves comparable results with GPT-4 on ProLex.

References (44)

Citations (4)

View on Semantic Scholar

Summary

The paper introduces ProLex, a benchmark that evaluates proficiency-oriented lexical substitution to enhance vocabulary diversity among L2 learners.
It leverages a human-annotated dataset from TOEFL-11 essays and candidate substitutes generated by GPT-4 to ensure contextual and grammatical accuracy.
Models like Llama2-13B, fine-tuned with synthetic data, outperformed larger LLMs, demonstrating effective proficiency-based lexical substitutions.

Introduction

In the sphere of automatic English learning tools, while grammar correction systems have received considerable attention, enhancing vocabulary diversity through apt lexical choices remains integral. Researchers have identified a challenge for English second-language (L2) learners: they tend to rely on a limited vocabulary set, impeding their performance in expressive writing. Existing lexical substitution systems aid learners in identifying appropriate word alternatives within a given context, promoting vocabulary expansion, but prior work largely disregards proficiency level in substituting target words.

ProLex Benchmark

To address this gap, the paper presents ProLex, a benchmark for evaluating language proficiency-oriented lexical substitution, advancing beyond the current paradigm that prioritizes contextual suitability. ProLex is grounded in the frequency of target words from the TOEFL-11 essay corpus, which represents typical L2 English learner usage patterns. This focus ensures that the benchmark aligns with the lexicon of beginner learners. A salient feature of ProLex is its human-annotated dataset, where human experts gauge candidate substitutes generated by GPT-4, following a comprehensive annotation scheme covering aspects like semantic integrity, collocation accuracy, lexical variation, and grammatical correctness.

Methodology and Model Performance

To facilitate automated assessment of this task, models were developed and benchmarked against ProLex. One model of note is the Llama2-13B model, fine-tuned with synthetic data tailored to the task, which outshined contemporary large-scale LLMs in performance metrics. GPT-4's proficiency in zero-shot and in-context learning settings further illustrates the feasibility of LLMs in addressing semantically complex tasks such as lexical substitution with a proficiency orientation.

Conclusions and Prospects

In summary, the introduction of ProLex paves the way for substantial advancements in computational English language learning, particularly in honing vocabulary breadth and writing dexterity among L2 learners. The benchmark empowers systems to recommend lexically diverse and proficient word substitutions, facilitating educational progress. Moving forward, the corpus intends to expand, refining its representativeness and fostering system advancements in the field of L2 instructional technology.

PDF Markdown

Related Papers

Tweets

https://twitter.com/XuanmingZhang07/status/1749878845585399964