Language Model Pre-Training with Sparse Latent Typing (2210.12582v2)

Published 23 Oct 2022 in cs.CL and cs.AI

Abstract: Modern large-scale Pre-trained LLMs (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the LLMs to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the LLM pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at: https://github.com/renll/SparseLT.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (6)

Liliang Ren (18 papers)
Zixuan Zhang (38 papers)
Han Wang (420 papers)
Clare R. Voss (14 papers)
Heng Ji (266 papers)
ChengXiang Zhai (64 papers)

Citations (3)

View on Semantic Scholar

GitHub

GitHub - renll/SparseLT: [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing (15 stars)

Language Model Pre-Training with Sparse Latent Typing (2210.12582v2)

Related Papers

GitHub