Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion (2401.12997v2)

Published 19 Jan 2024 in cs.CL

Abstract: In recent years, knowledge graph completion (KGC) models based on pre-trained LLM (PLM) have shown promising results. However, the large number of parameters and high computational cost of PLM models pose challenges for their application in downstream tasks. This paper proposes a progressive distillation method based on masked generation features for KGC task, aiming to significantly reduce the complexity of pre-trained models. Specifically, we perform pre-distillation on PLM to obtain high-quality teacher models, and compress the PLM network to obtain multi-grade student models. However, traditional feature distillation suffers from the limitation of having a single representation of information in teacher models. To solve this problem, we propose masked generation of teacher-student features, which contain richer representation information. Furthermore, there is a significant gap in representation ability between teacher and student. Therefore, we design a progressive distillation method to distill student models at each grade level, enabling efficient knowledge transfer from teachers to students. The experimental results demonstrate that the model in the pre-distillation stage surpasses the existing state-of-the-art methods. Furthermore, in the progressive distillation stage, the model significantly reduces the model parameters while maintaining a certain level of performance. Specifically, the model parameters of the lower-grade student model are reduced by 56.7\% compared to the baseline.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Transferring inductive biases through knowledge distillation.
  2. Semantic parsing via paraphrasing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1415–1425.
  3. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250.
  4. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems, 26.
  5. Unifying knowledge graph learning and recommendation: Towards a better understanding of user preferences. In The world wide web conference, pages 151–161.
  6. A novel embedding model for knowledge base completion based on convolutional neural network. In Proceedings of NAACL-HLT, pages 327–333.
  7. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
  8. Distilling the knowledge in a neural network.
  9. Complex temporal question answering on knowledge graphs. In Proceedings of the 30th ACM international conference on information & knowledge management, pages 792–802.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2.
  11. Multi-task learning for knowledge graph completion with pre-trained language models. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1737–1743.
  12. Scalable syntax-aware language models using knowledge distillation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3472–3484.
  13. Improved knowledge distillation from bi-directional to uni-directional lstm ctc for end-to-end speech recognition. In 2018 IEEE Spoken Language Technology Workshop (SLT), pages 411–417. IEEE.
  14. Multi-task pre-training language model for semantic network completion.
  15. Do pre-trained models benefit knowledge graph completion? a reliable evaluation and a reasonable approach. Association for Computational Linguistics.
  16. Miller, G. A. (1995). Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41.
  17. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473.
  18. Revisiting self-distillation.
  19. Improving language understanding by generative pre-training.
  20. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.
  21. Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 4498–4507.
  22. Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pages 593–607. Springer.
  23. End-to-end structure-aware convolutional networks for knowledge base completion. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 3060–3067.
  24. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web, pages 697–706.
  25. Patient knowledge distillation for bert model compression. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4323–4332.
  26. Rotate: Knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations.
  27. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd workshop on continuous vector space models and their compositionality, pages 57–66.
  28. Complex embeddings for simple link prediction. In International conference on machine learning, pages 2071–2080. PMLR.
  29. Composition-based multi-relational graph convolutional networks. In International Conference on Learning Representations.
  30. Wikidata: a free collaborative knowledgebase. Communications of the ACM, 57(10):78–85.
  31. Structure-augmented text representation learning for efficient knowledge graph completion. In Proceedings of the Web Conference 2021, pages 1737–1748.
  32. Knowledge graph convolutional networks for recommender systems. In The world wide web conference, pages 3307–3313.
  33. Mulde: Multi-teacher knowledge distillation for low-dimensional knowledge graph embeddings. In Proceedings of the Web Conference 2021, pages 1716–1726.
  34. Simkgc: Simple contrastive knowledge graph completion with pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4281–4294.
  35. Language models as knowledge embeddings.
  36. Greenkgc: A lightweight knowledge graph completion method. In In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 10596––10613.
  37. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th international conference on world wide web, pages 1271–1279.
  38. Learning from yourself: A self-distillation method for fake speech detection. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE.
  39. Focal and global knowledge distillation for detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4643–4652.
  40. Kg-bert: Bert for knowledge graph completion.
  41. Kglm: Integrating knowledge graph structure in language models for link prediction.
  42. On the inductive bias of masked language modeling: From statistical to syntactic dependencies. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5131–5146.
  43. Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 11953–11962.
  44. Dualde: Dually distilling knowledge graph embedding for faster and cheaper reasoning. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pages 1516–1524.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Cunhang Fan (35 papers)
  2. Yujie Chen (46 papers)
  3. Jun Xue (19 papers)
  4. Yonghui Kong (2 papers)
  5. Jianhua Tao (139 papers)
  6. Zhao Lv (22 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com