CEFR-Annotated WordNet Resource

Updated 15 November 2025

CEFR-Annotated WordNet is a lexical resource that systematically aligns WordNet’s synset inventory with CEFR proficiency levels to ease vocabulary selection for L2 learners.
It employs LLM-based semantic similarity and thresholding to transfer proficiency tags from authoritative CEFR-annotated dictionaries onto WordNet senses.
The resource facilitates adaptive vocabulary training, sense disambiguation, and curriculum design, with validation from lexical classification and sense grouping tests.

A CEFR-Annotated WordNet is a lexical-semantic resource that systematically aligns WordNet’s synset-level sense inventory with communicative proficiency levels defined by the Common European Framework of Reference for Languages (CEFR: A1–C2). The main objective is to bridge the gap between NLP and computer-assisted language learning (CALL) by enabling WordNet’s sense distinctions to support proficiency-aware vocabulary selection, sense disambiguation, and curriculum design. Recent advances implement this linkage at scale using LLMs to semantically align English lexical resources, transferring CEFR-level information onto WordNet, and validating the resulting resource both intrinsically and via downstream lexical classification.

1. Motivation and Theoretical Foundations

WordNet, as a semantically structured lexical database, comprises over 155,000 English lemmas and approximately 207,000 senses, each interlinked by relations such as synonymy and hypernymy. While this fine granularity benefits linguistic research and robust NLP sense inventories, it poses substantial cognitive load for L2 learners who face the challenge of distinguishing among many near-synonymous glosses. Conversely, the CEFR is the de facto international standard for categorizing L2 proficiency and provides explicit gradation (A1, A2, B1, B2, C1, C2) for lexical knowledge. WordNet’s lack of proficiency-level metadata impedes its use in adaptive CALL, proficiency-aware dictionaries, and granulated learning analytics.

Annotating WordNet senses with CEFR levels is motivated by two complementary goals: (i) restricting the presentation of senses to those appropriate for a learner’s proficiency, thus reducing lexical overload; (ii) equipping language learning systems to adaptively scaffold, highlight, or recommend materials according to individual lexical mastery. This approach also aligns with contemporary research highlighting the need for sense inventories that support both computational tasks and language education (Kikuchi et al., 21 Oct 2025, &&&1&&&).

2. LLM-Based Semantic Alignment Methodology

Both (Kikuchi et al., 21 Oct 2025) and (Kikuchi et al., 2024) employ LLMs to automate the mapping of WordNet senses to CEFR proficiency levels via semantic similarity, using authoritative external vocabulary profiles.

Gloss Pairing and Similarity Assessment:

For each (lemma, part-of-speech) pair, target gloss sets are constructed:

Reference glosses: from CEFR-tagged English Vocabulary Profile (EVP) or Cambridge dictionaries, with g₁,…,gₘ and associated level ℓ∈{A1,…,C2}
WordNet glosses: g′₁,…,g′ₙ

A prompt-based LLM (e.g., GPT-4.0 in (Kikuchi et al., 21 Oct 2025); ChatGPT gpt-4o in (Kikuchi et al., 2024)), at temperature zero for deterministic output, scores semantic similarity of every gloss pair (gᵢ, g′ⱼ) on a 1–7 integer scale:

1 = exactly the same meaning
2 = almost the same meaning
...
7 = completely different meaning

Formally, S(gᵢ, g′ⱼ) ∈ {1,…,7}.

CEFR Level Transfer by Thresholding:

A level is assigned to a WordNet sense whenever similarity is high (i.e., S(gᵢ, g′ⱼ) ≤ 2). The following pseudocode describes the process:

for g_i in EVP:
    for g_j in WordNet:
        s = LLM.similarity_rating(g_i, g_j)
        if s <= 2:
            annotate_sense(g_j, CEFR_level_of(g_i))

This "editor's term": semantic alignment by thresholded LLM similarity, enables fully automated annotation without further supervised learning at this phase.

For sense grouping (as in (Kikuchi et al., 2024)), each Cambridge sense cᵢ acts as a centroid for a group of WordNet senses matched via s≤2, ensuring that coarse-grained sense clusters all inherit the same CEFR tag from cᵢ.

3. Corpus Construction and Data Statistics

The resulting CEFR-Annotated WordNet is constructed from canonical lexical resources:

WordNet: 155,000 lemmas, 207,000 senses
EVP (American-English, single-word): 10,394 sense-level entries

Resource size after alignment (Kikuchi et al., 21 Oct 2025):

Lemmas: 5,645
Distinct WordNet senses annotated: 10,644
(Sense, CEFR) annotations: 10,995 (some senses receive multiple levels)

Distribution by Part-of-Speech and CEFR Level:

PoS	#senses	share (%)
noun	4,888	44.46
verb	3,163	28.77
adj	2,327	21.16
adv	617	5.61

Level	#annotations	share (%)
A1	1,183	6.07
A2	2,284	10.76
B1	3,221	20.77
B2	1,610	29.30
C1	2,030	14.64
C2	2,667	18.46

In the grouping approach (Kikuchi et al., 2024), 3,222 coarse sense groups are produced for 15,885 lemmas from the Cambridge Learner’s Dictionary (CLD), and 9,457 groups using the Cambridge English Dictionary (CED). Each group is a mapping: (lemma, cambridge_sense_id, CEFR_level, WordNet_sense_keys).

4. Evaluation via Lexical-Level Classification and Cohesiveness

No gold-standard sense-level CEFR annotation exists for WordNet, necessitating indirect validation.

Classification Evaluation (Kikuchi et al., 21 Oct 2025):

Using the annotated resource, contextual lexical classifiers are trained to predict CEFR level ℓ∈{A1, ..., C2} for tokens in context, leveraging:

SemCor-CEFR: SemCor 3.0 re-annotated using mapped CEFR levels (226,040 sense-tagged tokens)
EVP-derived contexts: 31,562 tokens

Modeling approaches:

ME6.Contextual: BERT + SVC classifier
Zero-/few-shot LLMs: GPT-5 prompts (0/6/18-shot)
Fine-tuned LLMs (FT): GPT-4.1-mini fine-tuned on EVP, SemCor-CEFR, or a mixture
Hybrid (FT+KB): Rule-based KB for unambiguous lexemes, otherwise FT classifier

Model	Macro-F1 (Mixture)
ME6.Contextual	0.61
Zero-shot	0.42
6-shot	0.47
18-shot	0.48
FT	0.73
FT+KB	0.81

The FT+KB approach achieves a Macro-F1 of 0.81, indicating high predictive accuracy. Fine-tuning on SemCor-CEFR alone yields Macro-F1 of 0.67, comparable to EVP-only (0.65), despite no test set overlap.

Spearman ρ correlation with CompLex 2.0 complexity ratings (7,662 tokens) for FT+KB on SemCor-CEFR is ≈0.54, demonstrating generalizability beyond dictionary-style contexts.

Group Cohesion and Separability (Kikuchi et al., 2024):

To test the semantic coherence of coarse sense groups, prompt-based tests with ChatGPT measure:

Intra-group confusability: Ratio_yes = 0.675 (CLD-based), much higher than CSI baseline (0.388)
Inter-group exclusivity: Ratio_no = 0.820 (CLD-based), similar to baseline (0.834)

This suggests LLM-based grouping yields highly internally cohesive and mutually exclusive sense clusters, supporting the soundness of the grouping strategy.

5. Data Structures, Formats, and Release

The CEFR-annotated resources are distributed in machine-friendly data schemas:

WordNet annotation file (JSON):
- key: WordNet synset/sense key
- value: list of CEFR levels (some senses may receive more than one)
Coarse sense grouping (JSON Lines):

{
  "lemma": "say",
  "cambridge_sense": "to speak words",
  "CEFR": "A1",
  "wordnet_sense_keys": ["say%2:32:15::", "say%2:32:00::"]
}

Corpora:
- CoNLL-style TSV for token-level annotation
Classifier resources:
- Python scripts for FT+KB classifier inference
Availability:
- All code, prompt templates, sense inventories, and classifiers are available via Zenodo: DOI 10.5281/zenodo.17395388 (Kikuchi et al., 21 Oct 2025), DOI 10.5281/zenodo.13706831 (Kikuchi et al., 2024).

6. Practical Applications and Significance

CEFR-Annotated WordNet resources enable a range of applications across NLP and language education:

Adaptive vocabulary training: CALL and e-learning platforms can selectively surface senses at or below a user’s proficiency, preventing cognitive overload.
Automated text analysis: Tools can flag out-of-level words and provide dynamic glossing or scaffolding during reading.
Curriculum design: Educators can select reading passages or construct exercises aligned to target CEFR bands with sense-level precision.
Research on lexical complexity: Fine-grained CEFR annotation at the sense level opens new avenues for empirical studies on lexical acquisition, difficulty, and proficiency modeling.
Bridging NLP and SLA: Resources support experimental pipelines for sense-aware, proficiency-driven algorithms in downstream NLP or psycholinguistic research.

A plausible implication is enhanced explainability and pedagogical value in CALL systems, as sense annotations reflect communicative appropriateness rather than only frequency or dictionary order.

7. Comparative Approaches and Future Directions

Both (Kikuchi et al., 21 Oct 2025) and (Kikuchi et al., 2024) use LLMs for semantic matching, but differ in granularity and information propagation:

(Kikuchi et al., 21 Oct 2025) provides direct synset-level CEFR tagging for WordNet, enabling sense-level disambiguation.
(Kikuchi et al., 2024) constructs sense groupings for each lemma based on Cambridge dictionary senses, propagating CEFR tags in a coarse-grained fashion and validating group integrity via LLM prompt tests.

Alternative approaches, such as embedding-based matching or string-overlap metrics, have been shown to produce less cohesive groupings, as reflected in lower intra-group confusability metrics. The LLM thresholding strategy thus establishes a scalable and effective pipeline for mapping proficiency metadata onto large lexical-semantic networks.

Future research may explore the propagation of CEFR information to multiword expressions, expansion to additional languages, and integration with pedagogical frameworks or adaptive assessment in educational technology.

8. Limitations and Considerations

LLM-based similarity is deterministic under temperature zero but may still rely on model biases and limitations of prompt design.
Multiple CEFR tags per sense, arising from divergent glosses or cross-level alignment, may complicate downstream usage.
The reliance on external CEFR-annotated dictionaries (EVP, CLD, CED) restricts the annotation’s lexical and sense coverage to what is available in those resources.
No human-annotated gold standard exists for sense-level CEFR mapping in WordNet, so evaluation relies on classification and group-cohesion proxies.
The annotation process, while fully automated and reproducible, is dependent on the stability and availability of specific LLM checkpoints.

These points define both the advances and the open challenges present in constructing, evaluating, and applying CEFR-Annotated WordNet resources in NLP and language education contexts.

Markdown Upgrade to Chat

References (2)

CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning (2025)

Coarse-Grained Sense Inventories Based on Semantic Matching between English Dictionaries (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CEFR-Annotated WordNet.

CEFR-Annotated WordNet Resource

1. Motivation and Theoretical Foundations

2. LLM-Based Semantic Alignment Methodology

3. Corpus Construction and Data Statistics

4. Evaluation via Lexical-Level Classification and Cohesiveness

5. Data Structures, Formats, and Release

6. Practical Applications and Significance

7. Comparative Approaches and Future Directions

8. Limitations and Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

CEFR-Annotated WordNet Resource

1. Motivation and Theoretical Foundations

2. LLM-Based Semantic Alignment Methodology

3. Corpus Construction and Data Statistics

4. Evaluation via Lexical-Level Classification and Cohesiveness

5. Data Structures, Formats, and Release

6. Practical Applications and Significance

7. Comparative Approaches and Future Directions

8. Limitations and Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research