CoLAKE: Contextualized Language and Knowledge Embedding

Published 1 Oct 2020 in cs.CL and cs.AI | (2010.00309v1)

Abstract: With the emerging branch of incorporating factual knowledge into pre-trained LLMs such as BERT, most existing models consider shallow, static, and separately pre-trained entity embeddings, which limits the performance gains of these models. Few works explore the potential of deep contextualized knowledge representation when injecting knowledge. In this paper, we propose the Contextualized Language and Knowledge Embedding (CoLAKE), which jointly learns contextualized representation for both language and knowledge with the extended MLM objective. Instead of injecting only entity embeddings, CoLAKE extracts the knowledge context of an entity from large-scale knowledge bases. To handle the heterogeneity of knowledge context and language context, we integrate them in a unified data structure, word-knowledge graph (WK graph). CoLAKE is pre-trained on large-scale WK graphs with the modified Transformer encoder. We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks. Experimental results show that CoLAKE outperforms previous counterparts on most of the tasks. Besides, CoLAKE achieves surprisingly high performance on our synthetic task called word-knowledge graph completion, which shows the superiority of simultaneously contextualizing language and knowledge representation.

Abstract PDF Upgrade to Chat

Citations (166)

View on Semantic Scholar

Summary

CoLAKE: Contextualized Language and Knowledge Embedding

The paper titled "CoLAKE: Contextualized Language and Knowledge Embedding" outlines an innovative approach to enhancing pre-trained language models by integrating structured knowledge into their architecture. The key proposition of the paper is CoLAKE, a model that jointly learns contextualized representations for both language and knowledge by extending the Masked Language Model (MLM) objective to include factual knowledge extracted from large-scale knowledge bases. Unlike existing models that rely on static, separately pre-trained entity embeddings, CoLAKE dynamically incorporates context from a knowledge graph, promising significant improvements in performance across various tasks requiring knowledge understanding.

Model Overview

CoLAKE differentiates itself from previous approaches such as ERNIE and KnowBERT by moving beyond static entity embeddings. Instead, it constructs a unified data structure known as the word-knowledge graph (WK graph), which merges language and knowledge contexts. This graph is configured with a modified Transformer encoder capable of handling the heterogeneity between linguistic and knowledge representations, facilitating dynamic contextual learning.

Graph Construction

The WK graph is constructed by fully connecting sentence tokens into a word graph and incorporating entity embeddings and their respective contexts from a knowledge base. Entities linked to mentions in a text serve as anchor nodes around which sub-graphs are formed, containing relations and neighboring entities. This design allows CoLAKE to adapt its knowledge representation dynamically based on the task.

Experimental Evaluation

The efficacy of CoLAKE was assessed across knowledge-driven tasks, knowledge probing sets, and general language understanding benchmarks. The model demonstrated superior performance in entity typing and relation extraction, as seen in datasets like Open Entity and FewRel. Furthermore, it exhibited notable improvements in factual knowledge assessment, particularly outperforming baselines in LAMA and LAMA-UHN probes. While its language understanding capabilities on GLUE tasks were marginally below the baseline RoBERTa, CoLAKE enhanced knowledge capture without compromising overall model integrity.

Word-Knowledge Graph Completion Task

A distinctive feature of CoLAKE is its inherent structure-awareness, akin to a pre-trained graph neural network (GNN), which facilitates inductive reasoning on unseen entities in a task termed word-knowledge graph completion. In both transductive and inductive evaluation scenarios, CoLAKE significantly surpassed traditional knowledge graph embedding methods, showcasing its strength in integrating structural and semantic information.

Implications and Future Directions

The comprehensive integration of contextualized language and knowledge representation as embodied by CoLAKE proposes several theoretical and practical implications. Theoretically, it challenges the paradigm of isolated language models by demonstrating the benefits of knowledge integration in pre-training. Practically, it sets a precedent for developing models that better understand and utilize complex entities and relations within text tasks, such as relationship extraction and entity linking. Looking ahead, applications of CoLAKE could encompass denoising data in knowledge extraction and evaluating graph-to-text templates, further bridging NLP and knowledge graph domains.

In conclusion, CoLAKE presents a viable pathway to enhancing pre-trained language models with knowledge representation, promising advancements in AI's ability to handle knowledge-intensive NLP tasks.