N-LTP: An Open-source Neural Language Technology Platform for Chinese (2009.11616v4)

Published 24 Sep 2020 in cs.CL

Abstract: We introduce \texttt{N-LTP}, an open-source neural language technology platform supporting six fundamental Chinese NLP tasks: {lexical analysis} (Chinese word segmentation, part-of-speech tagging, and named entity recognition), {syntactic parsing} (dependency parsing), and {semantic parsing} (semantic dependency parsing and semantic role labeling). Unlike the existing state-of-the-art toolkits, such as \texttt{Stanza}, that adopt an independent model for each task, \texttt{N-LTP} adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks. In addition, a knowledge distillation method \cite{DBLP:journals/corr/abs-1907-04829} where the single-task model teaches the multi-task model is further introduced to encourage the multi-task model to surpass its single-task teacher. Finally, we provide a collection of easy-to-use APIs and a visualization tool to make users to use and view the processing results more easily and directly. To the best of our knowledge, this is the first toolkit to support six Chinese NLP fundamental tasks. Source code, documentation, and pre-trained models are available at \url{https://github.com/HIT-SCIR/ltp}.

Authors (4)

Wanxiang Che (152 papers)
Yunlong Feng (26 papers)
Libo Qin (77 papers)
Ting Liu (329 papers)

Citations (93)

View on Semantic Scholar

Summary

Overview of N-LTP: An Open-source Neural Language Technology Platform for Chinese

The paper introduces N-LTP, an advanced open-source toolkit designed for Chinese NLP. Developed by researchers at the Harbin Institute of Technology, N-LTP addresses six fundamental NLP tasks: Chinese word segmentation, part-of-speech tagging, named entity recognition, dependency parsing, semantic dependency parsing, and semantic role labeling. This multi-task framework leverages shared pre-trained models to enhance efficiency and performance.

Key Contributions

N-LTP distinguishes itself from existing toolkits by integrating multi-task learning to harness shared knowledge across various tasks, reducing the need for separate models. A knowledge distillation process is utilized where the single-task model informs the multi-task model, potentially enabling the latter to outperform its single-task counterparts. The toolkit also includes user-friendly APIs and a visualization tool to facilitate interaction with processing results.

Comparative Analysis

The paper conducts a comparison of N-LTP against prominent NLP systems such as Stanza, UDPipe, and FLAIR. N-LTP demonstrates several advantages, including:

Comprehensive Task Support: Unlike other toolkits, N-LTP supports a broad spectrum of tasks integral to Chinese NLP.
Efficient Multi-task Learning: By employing multi-task learning, N-LTP optimizes memory usage and computational speed, which is crucial when deploying on resource-constrained devices.
Extensible Framework: The toolkit's modular design allows for easy integration of new models and tasks, supported by a robust configuration system.
State-of-the-art Performance: Across evaluated tasks, N-LTP exhibits competitive or superior performance metrics compared to other state-of-the-art models.

Empirical Results and Implications

Experimental results show that N-LTP outperforms Stanza on tasks such as word segmentation, POS tagging, NER, and dependency parsing. The joint training approach with distillation achieves notable speedups and memory efficiency. The multi-task model delivers enhanced performance, attributable to the shared encoder architecture that captures inter-task dependencies effectively.

Theoretical and Practical Implications

The use of multi-task learning in N-LTP underscores the theoretical advancements in utilizing shared representations in NLP. Practically, the toolkit provides a foundation for scalable NLP solutions in Chinese, catering to the increasing demand for language processing capabilities across a variety of applications.

Future Prospects

The introduction of N-LTP suggests potential future developments in the integration of multilingual and polyglot models that further exploit shared linguistic characteristics. Future research may focus on refining distillation techniques to bolster multi-task frameworks and expand toolkit capabilities to accommodate diverse linguistic phenomena.

In conclusion, N-LTP represents a significant advancement in toolkits for Chinese NLP, emphasizing efficiency, extensibility, and robust performance. The open-source nature invites continuous improvement and adoption within the research community, potentially fostering advancements in multilingual NLP processing and cross-linguistic applications.

PDF Markdown

Related Papers

GitHub

GitHub - HIT-SCIR/ltp: Language Technology Platform (4,846 stars)