Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT (1911.03681v2)

Published 9 Nov 2019 in cs.CL

Abstract: We present a novel way of injecting factual knowledge about entities into the pretrained BERT model (Devlin et al., 2019): We align Wikipedia2Vec entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to ERNIE (Zhang et al., 2019) and KnowBert (Peters et al., 2019), but it requires no expensive further pretraining of the BERT encoder. We evaluate E-BERT on unsupervised question answering (QA), supervised relation classification (RC) and entity linking (EL). On all three tasks, E-BERT outperforms BERT and other baselines. We also show quantitatively that the original BERT model is overly reliant on the surface form of entity names (e.g., guessing that someone with an Italian-sounding name speaks Italian), and that E-BERT mitigates this problem.

Summary of "E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT"

The paper introduces an entity-enhanced version of BERT, termed E-BERT, which integrates factual knowledge about entities into the pretrained BERT model by utilizing Wikipedia2Vec entity vectors. This approach is distinct from preceding methods such as ERNIE and KnowBert because it does not require additional pretraining of the BERT encoder. By aligning Wikipedia2Vec entity vectors with BERT's native wordpiece vector space, the authors propose a method to efficiently incorporate entity embeddings into BERT without altering the encoder architecture or incurring additional computational costs for pretraining.

Key Contributions

  1. Methodological Innovation: The paper aligns Wikipedia2Vec entity vectors with BERT's wordpiece vector space, allowing these entity vectors to be used as if they were native wordpiece vectors. This method avoids the need for expensive further pretraining of BERT's encoder, marking a departure from techniques employed by other entity-enhanced models like ERNIE and KnowBert.
  2. Empirical Validation: E-BERT is evaluated on various NLP tasks, including unsupervised question answering (QA) using the LAMA benchmark, supervised relation classification, and entity linking tasks. E-BERT demonstrates superior performance over BERT and other baselines across these tasks, suggesting that the integration of factual knowledge improves model capabilities without significantly increasing computational resource requirements.
  3. Addressing BERT's Limitations: A critique often directed at BERT is its reliance on the surface form of entity names, which can lead to biased results. E-BERT mitigates this issue and is less dependent on superficial linguistic cues, as evidenced by experiments with LAMA-UHN—a modified LAMA benchmark designed to filter out questions with overly suggestive entity names.
  4. Release Plans: The paper announces the intention to release E-BERT and LAMA-UHN, facilitating future research and application in the field of entity-aware language processing.

Experimental Results

  • Unsupervised QA: E-BERT sets a new state-of-the-art performance on the LAMA benchmark, surpassing BERT, ERNIE, and KnowBert. Notably, on the modified LAMA-UHN, designed to challenge the models by removing questions with suggestive names, E-BERT maintains robust performance, highlighting its reduced susceptibility to reliance on superficial entity name cues.
  • Relation Classification and Entity Linking: On these entity-centric tasks, E-BERT's approach of using aligned entity vectors yields competitive or superior results compared to BERT and other enhanced models that require additional pretraining.

Implications and Future Directions

The introduction of E-BERT underscores the potential to enhance pretrained LLMs (PLMs) with external knowledge without substantial computational overhead. This has significant implications for both practical applications where computational efficiency is paramount and theoretical explorations into alternative methods of integrating factual knowledge with LLMs. Furthermore, the results of E-BERT on LAMA-UHN suggest promising directions for developing models that are better calibrated to rely on entity knowledge rather than name-based reasoning.

This paper widens the scope for exploring entity-enhanced methodologies across diverse NLP tasks, potentially influencing future approaches to the deployment of knowledge-augmented systems in real-world applications. Additionally, the methodology presents a template for others to follow in aligning and integrating auxiliary embeddings into various PLMs, opening a pathway to more robust, efficient, and entity-aware AI systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Nina Poerner (9 papers)
  2. Ulli Waltinger (7 papers)
  3. Hinrich Schütze (250 papers)
Citations (150)
Youtube Logo Streamline Icon: https://streamlinehq.com