GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (1908.07245v4)

Published 20 Aug 2019 in cs.CL

Abstract: Word Sense Disambiguation (WSD) aims to find the exact sense of an ambiguous word in a particular context. Traditional supervised methods rarely take into consideration the lexical resources like WordNet, which are widely utilized in knowledge-based methods. Recent studies have shown the effectiveness of incorporating gloss (sense definition) into neural networks for WSD. However, compared with traditional word expert supervised methods, they have not achieved much improvement. In this paper, we focus on how to better leverage gloss knowledge in a supervised neural WSD system. We construct context-gloss pairs and propose three BERT-based models for WSD. We fine-tune the pre-trained BERT model on SemCor3.0 training corpus and the experimental results on several English all-words WSD benchmark datasets show that our approach outperforms the state-of-the-art systems.

PDF Abstract

GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge

The paper "GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge" presents a novel approach to Word Sense Disambiguation (WSD) by integrating gloss knowledge into a BERT-based framework. This research addresses the fundamental challenge in NLP of determining the precise meaning of an ambiguous word given its context. Traditional methods for WSD often involve knowledge-based strategies leveraging lexical resources such as WordNet, or supervised techniques utilizing feature-based classifiers—referred to as word experts—for every target lemma. However, these methods face limitations in scalability and flexibility, especially in comprehensive, all-words WSD tasks.

This paper introduces an innovative methodology that reformulates WSD as a sentence-pair classification problem. This is achieved by constructing context-gloss pairs from glosses of all possible senses of a word found in WordNet. The construction integrates BERT, a pre-trained LLM renowned for its efficacy in tasks such as question answering and natural language inference. By fine-tuning BERT on the SemCor3.0 training corpus, the authors claim that their approach surpasses existing state-of-the-art systems in several evaluation benchmarks.

Methodological Advancements:

Context-Gloss Pair Construction: The authors develop a technique to compile context-gloss pairs that transform the WSD task into a sentence-pair classification problem. This aligns with the tendencies of effective strategies in sentence-pair writing tasks, utilizing a similar architecture as those effective in NLI tasks.
BERT-Based Models: The primary contribution includes three BERT-based models—GlossBERT(Token-CLS), GlossBERT(Sent-CLS), and GlossBERT(Sent-CLS-WS)—which differ in their handling of context-gloss pairs. GlossBERT(Token-CLS) highlights the ambiguous word within the input sequence, while GlossBERT(Sent-CLS-WS) incorporates weak supervision to better emphasize the target word during training.
Experimental Evaluation: The authors conducted extensive evaluation on multiple English all-words WSD benchmark datasets (SemEval and Senseval), consistently showcasing superior performance compared to existing knowledge-based, word expert supervised, and neural-based systems.

Key Findings:

The experimental results indicate that GlossBERT models, particularly GlossBERT(Sent-CLS-WS), outperform previous methods. They exhibit substantial improvement on datasets like SE07 and verb POS categories, often challenging due to higher levels of ambiguity. Furthermore, these models exhibit an advantageous simplification over traditional approaches by eliminating the need to train separate classifiers for each lemma.

Implications and Future Directions:

The introduction of GlossBERT underscores the potential of leveraging gloss information through sophisticated models like BERT in WSD tasks, providing a template that could theoretically extend to other NLP tasks reliant on contextual understanding. Practically, the ability to pre-train a single, robust model across datasets implies significant utility in multilingual and domain-specific applications, where lexical variance complicates WSD efforts.

Future exploration may endeavor to:

Extend this framework to multilingual WSD systems by integrating multilingual LLMs.
Investigate the benefits of using larger and more varied gloss corpuses, or adapting the methodology for domain-specific lexical resources.
Explore the potential for combining features from other linguistic resources, such as hierarchical semantic relationships in lexical databases.

In conclusion, the integration of gloss knowledge with BERT for WSD, as proposed in GlossBERT, positions itself as a highly potent and scalable solution within the domain of NLP, particularly enhancing the capability to discern semantics in complex textual contexts.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Luyao Huang (3 papers)
Chi Sun (15 papers)
Xipeng Qiu (257 papers)
Xuanjing Huang (287 papers)

Citations (227)

View on Semantic Scholar

GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (1908.07245v4)