Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MLRIP: Pre-training a military language representation model with informative factual knowledge and professional knowledge base (2207.13929v1)

Published 28 Jul 2022 in cs.CL

Abstract: Incorporating prior knowledge into pre-trained LLMs has proven to be effective for knowledge-driven NLP tasks, such as entity typing and relation extraction. Current pre-training procedures usually inject external knowledge into models by using knowledge masking, knowledge fusion and knowledge replacement. However, factual information contained in the input sentences have not been fully mined, and the external knowledge for injecting have not been strictly checked. As a result, the context information cannot be fully exploited and extra noise will be introduced or the amount of knowledge injected is limited. To address these issues, we propose MLRIP, which modifies the knowledge masking strategies proposed by ERNIE-Baidu, and introduce a two-stage entity replacement strategy. Extensive experiments with comprehensive analyses illustrate the superiority of MLRIP over BERT-based models in military knowledge-driven NLP tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hui Li (1004 papers)
  2. Xuekang Yang (2 papers)
  3. Xin Zhao (160 papers)
  4. Lin Yu (2 papers)
  5. Jiping Zheng (4 papers)
  6. Wei Sun (373 papers)