FLAT: Chinese NER Using Flat-Lattice Transformer (2004.11795v2)

Published 24 Apr 2020 in cs.CL

Abstract: Recently, the character-word lattice structure has been proved to be effective for Chinese named entity recognition (NER) by incorporating the word information. However, since the lattice structure is complex and dynamic, most existing lattice-based models are hard to fully utilize the parallel computation of GPUs and usually have a low inference-speed. In this paper, we propose FLAT: Flat-LAttice Transformer for Chinese NER, which converts the lattice structure into a flat structure consisting of spans. Each span corresponds to a character or latent word and its position in the original lattice. With the power of Transformer and well-designed position encoding, FLAT can fully leverage the lattice information and has an excellent parallelization ability. Experiments on four datasets show FLAT outperforms other lexicon-based models in performance and efficiency.

Authors (4)

Xiaonan Li (48 papers)
Hang Yan (86 papers)
Xipeng Qiu (257 papers)
Xuanjing Huang (287 papers)

Citations (365)

View on Semantic Scholar

Summary

FLAT: Chinese NER Using Flat-Lattice Transformer

The paper presents a novel approach to enhancing Chinese Named Entity Recognition (NER) tasks with the introduction of FLAT (Flat-Lattice Transformer), which aims to overcome the limitations inherent in existing lattice-based systems. Chinese NER is uniquely challenging due to the lack of explicit word boundaries and reliance on character-level segmentation. Traditional systems often face performance bottlenecks when faced with the complex and dynamic nature of lattice structures, which limits their ability to leverage parallel GPU computation effectively.

Overview and Methodology

FLAT offers an innovative take on the lattice model by transforming it into a flat structure composed of spans. Each span within this structure represents a character or a potential word along with its original lattice position, allowing FLAT to integrate both character- and word-level information efficiently. This transformation retains the lattice's word boundary information while enabling the model to utilize the Transformer architecture's parallel processing capabilities.

Key to FLAT's implementation is the use of well-designed positional encoding that aids in reconstructing the lattice from the flat structure. Positional indices for head and tail of tokens account for relative positioning, enhancing the model's capability to capture rich semantic and boundary cues essential for accurate NER.

Experimental Results

The paper demonstrates the superiority of FLAT in terms of both performance and computational efficiency across four benchmark datasets for Chinese NER: Ontonotes, MSRA, Resume, and Weibo. FLAT consistently outperformed previous lexicon-based models including Lattice LSTM, LR-CNN, and GNN-based systems like LGN and CGN, achieving improvements in F1-score and faster inference speeds.

Notably, FLAT's advances were benchmarked against several baselines, showing an average F1-score improvement of up to 1.72 over existing Transformer-based models lacking lattice integration. Performance gains were attributed to enhanced entity boundary detection and the ability to leverage long-distance dependencies.

Implications and Future Work

FLAT not only advances the state-of-the-art in Chinese NER performance but also opens up new possibilities for the application of similar techniques to other sequence labeling tasks where explicit boundary information is scarce. The model's structure could potentially be adapted to other languages or domains where similar challenges exist, broadening its utility.

Furthermore, the integration with pre-trained embeddings such as BERT illustrates FLAT’s adaptability to more elaborate NER frameworks, paving the way for achieving even higher accuracy levels, particularly in resource-rich scenarios.

Future research could explore extending FLAT’s framework to more complex lattice or graph structures, potentially optimizing its application for multilingual and cross-domain NER tasks. The modular design and efficient parallelization properties of FLAT will likely influence upcoming developments in NER research and related NLP applications.

PDF Markdown

Related Papers

Find Related Papers