Contrastive Document Representation Learning with Graph Attention Networks (2110.10778v1)
Abstract: Recent progress in pretrained Transformer-based LLMs has shown great success in learning contextual representation of text. However, due to the quadratic self-attention complexity, most of the pretrained Transformers models can only handle relatively short text. It is still a challenge when it comes to modeling very long documents. In this work, we propose to use a graph attention network on top of the available pretrained Transformers model to learn document embeddings. This graph attention network allows us to leverage the high-level semantic structure of the document. In addition, based on our graph document model, we design a simple contrastive learning strategy to pretrain our models on a large amount of unlabeled corpus. Empirically, we demonstrate the effectiveness of our approaches in document classification and document retrieval tasks.
- Peng Xu (357 papers)
- Xinchi Chen (15 papers)
- Xiaofei Ma (31 papers)
- Zhiheng Huang (33 papers)
- Bing Xiang (74 papers)