Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models (2403.11203v1)

Published 17 Mar 2024 in cs.CL

Abstract: KEPLMs are pre-trained models that utilize external knowledge to enhance language understanding. Previous LLMs facilitated knowledge acquisition by incorporating knowledge-related pre-training tasks learned from relation triples in knowledge graphs. However, these models do not prioritize learning embeddings for entity-related tokens. Moreover, updating the entire set of parameters in KEPLMs is computationally demanding. This paper introduces TRELM, a Robust and Efficient Pre-training framework for Knowledge-Enhanced LLMs. We observe that entities in text corpora usually follow the long-tail distribution, where the representations of some entities are suboptimally optimized and hinder the pre-training process for KEPLMs. To tackle this, we employ a robust approach to inject knowledge triples and employ a knowledge-augmented memory bank to capture valuable information. Furthermore, updating a small subset of neurons in the feed-forward networks (FFNs) that store factual knowledge is both sufficient and efficient. Specifically, we utilize dynamic knowledge routing to identify knowledge paths in FFNs and selectively update parameters during pre-training. Experimental results show that TRELM reduces pre-training time by at least 50% and outperforms other KEPLMs in knowledge probing tasks and multiple knowledge-aware language understanding tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Junbing Yan (10 papers)
  2. Chengyu Wang (93 papers)
  3. Taolin Zhang (34 papers)
  4. Xiaofeng He (33 papers)
  5. Jun Huang (126 papers)
  6. Longtao Huang (27 papers)
  7. Hui Xue (109 papers)
  8. Wei Zhang (1489 papers)