ConcEPT: Concept-Enhanced Pre-Training for Language Models (2401.05669v1)
Abstract: Pre-trained LLMs (PLMs) have been prevailing in state-of-the-art methods for natural language processing, and knowledge-enhanced PLMs are further proposed to promote model performance in knowledge-intensive tasks. However, conceptual knowledge, one essential kind of knowledge for human cognition, still remains understudied in this line of research. This limits PLMs' performance in scenarios requiring human-like cognition, such as understanding long-tail entities with concepts. In this paper, we propose ConcEPT, which stands for Concept-Enhanced Pre-Training for LLMs, to infuse conceptual knowledge into PLMs. ConcEPT exploits external taxonomies with entity concept prediction, a novel pre-training objective to predict the concepts of entities mentioned in the pre-training contexts. Unlike previous concept-enhanced methods, ConcEPT can be readily adapted to various downstream applications without entity linking or concept mapping. Results of extensive experiments show the effectiveness of ConcEPT in four tasks such as entity typing, which validates that our model gains improved conceptual knowledge with concept-enhanced pre-training.
- Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3554–3565, Online. Association for Computational Linguistics.
- Inspecting the concept knowledge graph encoded by modern language models. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2984–3000, Online. Association for Computational Linguistics.
- Syntax-bert: Improving pre-trained transformers with syntax trees. arXiv preprint arXiv:2103.04350.
- Critical thinking for language models. In Proceedings of the 14th International Conference on Computational Semantics (IWCS), pages 63–75, Groningen, The Netherlands (online). Association for Computational Linguistics.
- COMET: Commonsense transformers for automatic knowledge graph construction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4762–4779, Florence, Italy. Association for Computational Linguistics.
- Ultra-fine entity typing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 87–96, Melbourne, Australia. Association for Computational Linguistics.
- Discovering latent concepts learned in BERT. In International Conference on Learning Representations.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Paolo Ferragina and Ugo Scaiella. 2010. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 1625–1628.
- ConceptBert: Concept-aware representation for visual question answering. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 489–498, Online. Association for Computational Linguistics.
- Retrieval augmented language model pre-training. In International Conference on Machine Learning, pages 3929–3938. PMLR.
- Nora Kassner and Hinrich Schütze. 2020. Negated and misprimed probes for pretrained language models: Birds can talk, but cannot fly. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7811–7818, Online. Association for Computational Linguistics.
- Efficient pre-training of masked language model via concept-based curriculum masking. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.
- Mvptr: Multi-level semantic alignment for vision-language pre-training via multi-stage learning. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4395–4405.
- Xiao Ling and Daniel Weld. 2012. Fine-grained entity recognition.
- K-bert: Enabling language representation with knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 2901–2908.
- Kg-bart: Knowledge graph-augmented bart for generative commonsense reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6418–6425.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Eric Margolis and Stephen Laurence. 2022. Concepts. In The Stanford Encyclopedia of Philosophy, fall 2022 edition. Metaphysics Research Lab, Stanford University.
- Asking without telling: Exploring latent ontologies in contextual representations. arXiv preprint arXiv:2004.14513.
- Gregory Murphy. 2004. The big book of concepts. MIT press.
- Learning from Context or Names? An Empirical Study on Neural Relation Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3661–3672, Online. Association for Computational Linguistics.
- Learning from context or names? an empirical study on neural relation extraction. arXiv preprint arXiv:2010.01923.
- Copen: Probing conceptual knowledge in pre-trained language models. arXiv preprint arXiv:2211.04079.
- Knowledge enhanced contextual word representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 43–54, Hong Kong, China. Association for Computational Linguistics.
- Language models as knowledge bases? arXiv preprint arXiv:1909.01066.
- ERICA: Improving entity and relation understanding for pre-trained language models via contrastive learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3350–3363, Online. Association for Computational Linguistics.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
- Visualizing and measuring the geometry of bert. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
- Exploiting structured knowledge in text via graph-guided representation learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8980–8994, Online. Association for Computational Linguistics.
- Matching the blanks: Distributional similarity for relation learning. arXiv preprint arXiv:1906.03158.
- Conceptnet 5.5: An open multilingual graph of general knowledge. In Thirty-first AAAI conference on artificial intelligence.
- Colake: Contextualized language and knowledge embedding. arXiv preprint arXiv:2010.00309.
- ERNIE: enhanced representation through knowledge integration. CoRR, abs/1904.09223.
- Hao Tan and Mohit Bansal. 2019. LXMERT: Learning cross-modality encoder representations from transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5100–5111, Hong Kong, China. Association for Computational Linguistics.
- Attention is all you need. Advances in neural information processing systems, 30.
- Connecting the dots: A knowledgeable path generator for commonsense question answering. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4129–4140, Online. Association for Computational Linguistics.
- KEPLER: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics, 9:176–194.
- Language models as knowledge embeddings.
- Prototypical concept representation. IEEE Transactions on Knowledge and Data Engineering.
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
- Probase: A probabilistic taxonomy for text understanding. In Proceedings of the 2012 ACM SIGMOD international conference on management of data, pages 481–492.
- Pretrained encyclopedia: Weakly supervised knowledge-pretrained language model. In International Conference on Learning Representations.
- Kg-bert: Bert for knowledge graph completion. CoRR, abs/1909.03193.
- Encoding the meaning triangle (object, entity, and concept) as the semantic foundation for entity alignment. In International Conference on Web Information Systems Engineering, pages 227–241. Springer.
- Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129.
- Xintao Wang (132 papers)
- Zhouhong Gu (23 papers)
- Jiaqing Liang (62 papers)
- Dakuan Lu (7 papers)
- Yanghua Xiao (151 papers)
- Wei Wang (1793 papers)