Language Models as Hierarchy Encoders (2401.11374v4)
Abstract: Interpreting hierarchical structures latent in language is a key limitation of current LMs. While previous research has implicitly leveraged these hierarchies to enhance LMs, approaches for their explicit encoding are yet to be explored. To address this, we introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs), harnessing the expansive nature of hyperbolic space. Our method situates the output embedding space of pre-trained LMs within a Poincar\'e ball with a curvature that adapts to the embedding dimension, followed by training on hyperbolic clustering and centripetal losses. These losses are designed to effectively cluster related entities (input as texts) and organise them hierarchically. We evaluate HiTs against pre-trained LMs, standard fine-tuned LMs, and several hyperbolic embedding baselines, focusing on their capabilities in simulating transitive inference, predicting subsumptions, and transferring knowledge across hierarchies. The results demonstrate that HiTs consistently outperform all baselines in these tasks, underscoring the effectiveness and transferability of our re-trained hierarchy encoders.
- Translating embeddings for modeling multi-relational data. Advances in neural information processing systems, 26, 2013.
- Probing bert in hyperbolic spaces. In International Conference on Learning Representations, 2020.
- Owl2vec*: Embedding of owl ontologies. Machine Learning, 110(7):1813–1845, 2021.
- Contextual semantic embeddings for ontology subsumption prediction. World Wide Web, pp. 1–23, 2023.
- Foodon: a harmonized food ontology to increase global food traceability, quality control and data integration. npj Science of Food, 2(1):23, 2018.
- Hyperbolic neural networks. Advances in neural information processing systems, 31, 2018a.
- Hyperbolic entailment cones for learning hierarchical embeddings. In International Conference on Machine Learning, pp. 1646–1655. PMLR, 2018b.
- Simcse: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6894–6910, 2021.
- Sorbet: A siamese network for ontology embeddings using a distance-based regression loss and bert. In International Semantic Web Conference, pp. 561–578. Springer, 2023.
- Gromov, M. Hyperbolic groups. In Essays in group theory, pp. 75–263. Springer, 1987.
- Schema. org: evolution of structured data on the web. Communications of the ACM, 59(2):44–51, 2016.
- Analyzing bert’s knowledge of hypernymy via prompting. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 275–282, 2021.
- Deeponto: A python package for ontology engineering with deep learning. arXiv preprint arXiv:2307.03067, 2023a.
- Language model analysis for ontology subsumption inference. In Rogers, A., Boyd-Graber, J., and Okazaki, N. (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp. 3439–3453, Toronto, Canada, July 2023b. Association for Computational Linguistics.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, pp. 2, 2019.
- Geoopt: Riemannian optimization in pytorch. arXiv preprint arXiv:2005.02819, 2020.
- Does bert know that the is-a relation is transitive? In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 94–99, 2022.
- Self-alignment pretraining for biomedical entity representations. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4228–4238, 2021.
- Concept placement using bert trained by transforming and summarizing biomedical ontology structure. Journal of Biomedical Informatics, 112:103607, 2020.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
- Miller, G. A. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
- Poincaré embeddings for learning hierarchical representations. Advances in neural information processing systems, 30, 2017.
- OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
- Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2463–2473, 2019.
- Improving language understanding by generative pre-training. 2018.
- Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992, 2019.
- Masked language model scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2699–2712, 2020.
- Disease ontology: a backbone for disease semantic integration. Nucleic acids research, 40(D1):D940–D946, 2012.
- Mpnet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems, 33:16857–16867, 2020.
- Snomed clinical terms: overview of the development process and project status. In Proceedings of the AMIA Symposium, pp. 662. American Medical Informatics Association, 2001.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Advances in Neural Information Processing Systems, 33:5776–5788, 2020.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp. 38–45, 2020.
- Xiao, H. bert-as-service. https://github.com/hanxiao/bert-as-service, 2018.
- Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575, 2014.