TechGPT-2.0: A large language model project to solve the task of knowledge graph construction (2401.04507v1)

Published 9 Jan 2024 in cs.CL and cs.AI

Abstract: LLMs have exhibited robust performance across diverse natural language processing tasks. This report introduces TechGPT-2.0, a project designed to enhance the capabilities of LLMs specifically in knowledge graph construction tasks, including named entity recognition (NER) and relationship triple extraction (RTE) tasks in NLP applications. Additionally, it serves as a LLM accessible for research within the Chinese open-source model community. We offer two 7B LLM weights and a QLoRA weight specialized for processing lengthy texts.Notably, TechGPT-2.0 is trained on Huawei's Ascend server. Inheriting all functionalities from TechGPT-1.0, it exhibits robust text processing capabilities, particularly in the domains of medicine and law. Furthermore, we introduce new capabilities to the model, enabling it to process texts in various domains such as geographical areas, transportation, organizations, literary works, biology, natural sciences, astronomical objects, and architecture. These enhancements also fortified the model's adeptness in handling hallucinations, unanswerable queries, and lengthy texts. This report provides a comprehensive and detailed introduction to the full fine-tuning process on Huawei's Ascend servers, encompassing experiences in Ascend server debugging, instruction fine-tuning data processing, and model training. Our code is available at https://github.com/neukg/TechGPT-2.0

Authors (9)

Jiaqi Wang (218 papers)
Yuying Chang (1 paper)
Zhong Li (53 papers)
Ning An (29 papers)
Qi Ma (21 papers)
Lei Hei (4 papers)
Haibo Luo (8 papers)
Yifei Lu (5 papers)
Feiliang Ren (18 papers)

Citations (5)

View on Semantic Scholar

Summary

An Overview of TechGPT-2.0: LLM for Knowledge Graph Construction

Introduction

TechGPT-2.0 represents an increasingly significant stride in the domain of merging LLMs with Knowledge Graph (KG) construction, which has become a focal point of contemporary NLP research. Through the development of TechGPT-2.0, the authors aim to enhance the utility and performance of LLMs in tasks integral to the creation of KGs, specifically targeting Named Entity Recognition (NER) and Relationship Triple Extraction (RTE). The project introduces two 7B models and a QLoRA weight model designed for processing lengthy texts, markedly enhancing processing capability in varied domains such as medicine, law, geography, and more. Trained using the Ascend server, TechGPT-2.0 is accessible within the Chinese open-source model community, promising advancements in automatic KG construction.

Technical Contributions and Methodology

The authors address the unresolved potential for LLMs and KGs to complement and integrate with each other by focusing on several key technical aspects:

Model Selection and Adaptation: The project is grounded on the LLAMA2 architecture, with significant attention to ensuring optimal adaptation and performance on Huawei's Ascend server with the Mindspore framework. The authors leverage two models with the 7B parameters and a QLoRA model weight, showcasing dedication to enhancing long-text processing performance effectively.
Data Curation and Processing: A meticulous collection process resulted in a dataset comprising approximately 4 million instruction fine-tuning instances, divided across general tasks and specific KG-related subtasks. This includes fine-tuning data explicitly tailored to domain-specific tasks in medicine and law, among others.
Server Utilization Insights: Integrating and debugging the Ascend server was addressed openly, given its critical role in enabling the training of such extensive models. The distinction between using NVIDIA graphics cards vs. Ascend servers offers salient insights, potentially guiding other researchers in their endeavors on similar platforms.
Long-Text Problem Solving: Employing position interpolation to overcome ongoing challenges with long text input, the authors achieve enhanced computational efficiency without compromising the overall integrity of the output, thus broadening the model’s practical deployment spectrum.

Implications and Future Directions

The TechGPT-2.0 project not only signifies a noteworthy contribution to the Chinese open-source model community but also underlines critical areas for further progress within LLM-KG synergy:

Application of TechGPT-2.0 in Diverse Domains: The refined performance of TechGPT-2.0 across a multitude of specialized domains underscores its potential utility in automatic legal case sorting, medical consultation, and more. This adaptability could significantly streamline operations requiring domain-specific KG construction, aiding tasks requiring copious amounts of data processing.
Further Research in Tool Integration and Multi-modal Systems: While providing a promising foundation, the research signifies further exploration is needed into areas including RAG, tools calling, and multi-modality implications, potentially boosting interactivity and comprehension abilities in varied real-world applications.
Significance in Open-source Access and Collaboration: By offering models in open-source repositories, TechGPT-2.0 facilitates broader collaborative efforts within the NLP community, extending its utility while serving as a blueprint for addressing similar computational and methodological challenges.

In conclusion, TechGPT-2.0 stands as a pivotal step towards strengthening the integration of LLMs and KGs, promising substantial advancements in automatic knowledge graph construction and beyond. The authors’ detailed exposition on model training, data handling, and server interactions provides invaluable guidance for navigating the complexities associated with LLM deployment in specialized domains. The breadth of work undertaken points toward an accelerated growth trajectory for this field, underscoring the evolving interplay between LLM and KG methodologies.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - neukg/TechGPT-2.0: TechGPT 2.0: Technology-Oriented Generative Pretrained Transformer 2.0 (103 stars)