Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Language Model Pre-Training with Sparse Latent Typing (2210.12582v2)

Published 23 Oct 2022 in cs.CL and cs.AI

Abstract: Modern large-scale Pre-trained LLMs (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the LLMs to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the LLM pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at: https://github.com/renll/SparseLT.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Liliang Ren (18 papers)
  2. Zixuan Zhang (38 papers)
  3. Han Wang (420 papers)
  4. Clare R. Voss (14 papers)
  5. Heng Ji (266 papers)
  6. ChengXiang Zhai (64 papers)
Citations (3)
Github Logo Streamline Icon: https://streamlinehq.com