Knowledge Enhanced Contextual Word Representations (1909.04164v2)

Published 9 Sep 2019 in cs.CL

Abstract: Contextual word representations, typically trained on unstructured, unlabeled text, do not contain any explicit grounding to real world entities and are often unable to remember facts about those entities. We propose a general method to embed multiple knowledge bases (KBs) into large scale models, and thereby enhance their representations with structured, human-curated knowledge. For each KB, we first use an integrated entity linker to retrieve relevant entity embeddings, then update contextual word representations via a form of word-to-entity attention. In contrast to previous approaches, the entity linkers and self-supervised LLMing objective are jointly trained end-to-end in a multitask setting that combines a small amount of entity linking supervision with a large amount of raw text. After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation. KnowBert's runtime is comparable to BERT's and it scales to large KBs.

PDF Abstract

Analysis of "Knowledge Enhanced Contextual Word Representations"

The paper "Knowledge Enhanced Contextual Word Representations" presents a novel methodology for integrating structured knowledge bases (KBs) into transformer-based LLMs, thereby enhancing their ability to recall and utilize factual information. The proposed approach leverages BERT as a foundational model, augmenting it with external, human-curated knowledge from databases like WordNet and Wikipedia through a mechanism termed Knowledge Attention and Recontextualization (KAR).

Methodology Overview

The research addresses a critical limitation of pretrained LLMs, which typically lack explicit grounding in real-world entities. The methodology involves embedding multiple KBs into a pretrained model via an integrated entity linker. This enables the model to retrieve pertinent entity embeddings and update word representations with an attention mechanism focused on entities. Unlike prior techniques that incorporate external knowledge at a task level, this method is scalable and enhances general-purpose entity representations, suitable for various downstream tasks.

The KAR operates by first identifying entity spans in the text using a candidate selector. For each span, a set of candidate entities is resolved via a learned entity linker, which uses both prior probabilities and learned contextual information to produce enhanced span representations. These span vectors are then recontextualized with word-to-entity-span attention, allowing long-range interactions that are crucial for contextual understanding.

Numerical Performance

The experimental section demonstrates the efficacy of this approach against key metrics. The enhanced BERT, or KnowBert, markedly improves masked LLM perplexity and factual recall capabilities as measured against a probing task with Wikidata facts. Notably, KnowBert outperforms the native BERT (BASE and LARGE) models on several intrinsic and extrinsic benchmarks, including relationship extraction, entity typing, and word sense disambiguation. The addition of structured knowledge improves KnowBert's performance on tasks like TACRED and SemEval 2010 Task 8, with noticeable gains in factual recall accuracy.

Key Findings and Claims

The paper substantiates several claims:

Improved Perplexity and Recall: Integration of WordNet and Wikipedia significantly lowers perplexity and enhances factual recall, outperforming existing state-of-the-art models.
Runtime Efficiency: Despite the additional complexity of handling KBs, KnowBert’s runtime remains comparable to its baseline counterpart, preserving much of BERT’s efficiency due to the sparse application of entity embeddings.
Effective Knowledge Transfer: The KAR mechanism facilitates knowledge transfer from KBs to the LLM. This is evidenced by improved performance on tasks that rely heavily on factual knowledge.

Implications and Future Directions

The implications of this research are profound for both practical applications and theoretical advancements. Practically, enhancing LLMs with factual knowledge could lead to more interpretable AI systems capable of better reasoning and decision-making. Theoretically, this paper suggests a promising direction for combining distributional semantics with structured knowledge, potentially leading to new paradigms in NLP model architectures.

Future work could explore integrating domain-specific KBs, which might offer specialized enhancements for industry-specific applications. Expanding the scalability and flexibility of these methods to accommodate larger and more complex KBs without compromising efficiency remains another promising avenue. The principles established herein set a foundation for embedding richer, more nuanced entity representations in even more sophisticated AI systems.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Matthew E. Peters (27 papers)
Mark Neumann (13 papers)
Robert L. Logan IV (13 papers)
Roy Schwartz (74 papers)
Vidur Joshi (2 papers)
Sameer Singh (96 papers)
Noah A. Smith (224 papers)

Citations (639)

View on Semantic Scholar