Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knowledge-Aware Language Model Pretraining (2007.00655v2)

Published 29 Jun 2020 in cs.CL, cs.LG, and stat.ML

Abstract: How much knowledge do pretrained LLMs hold? Recent research observed that pretrained transformers are adept at modeling semantics but it is unclear to what degree they grasp human knowledge, or how to ensure they do so. In this paper we incorporate knowledge-awareness in LLM pretraining without changing the transformer architecture, inserting explicit knowledge layers, or adding external storage of semantic information. Rather, we simply signal the existence of entities to the input of the transformer in pretraining, with an entity-extended tokenizer; and at the output, with an additional entity prediction task. Our experiments show that solely by adding these entity signals in pretraining, significantly more knowledge is packed into the transformer parameters: we observe improved LLMing accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing.We also show that our knowledge-aware LLM (KALM) can serve as a drop-in replacement for GPT-2 models, significantly improving downstream tasks like zero-shot question-answering with no task-related training.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Corby Rosset (21 papers)
  2. Chenyan Xiong (95 papers)
  3. Minh Phan (3 papers)
  4. Xia Song (38 papers)
  5. Paul Bennett (17 papers)
  6. Saurabh Tiwary (15 papers)
Citations (70)