Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer (2203.16954v1)

Published 31 Mar 2022 in cs.CL, cs.SD, and eess.AS

Abstract: Text normalization, defined as a procedure transforming non standard words to spoken-form words, is crucial to the intelligibility of synthesized speech in text-to-speech system. Rule-based methods without considering context can not eliminate ambiguation, whereas sequence-to-sequence neural network based methods suffer from the unexpected and uninterpretable errors problem. Recently proposed hybrid system treats rule-based model and neural model as two cascaded sub-modules, where limited interaction capability makes neural network model cannot fully utilize expert knowledge contained in the rules. Inspired by Flat-LAttice Transformer (FLAT), we propose an end-to-end Chinese text normalization model, which accepts Chinese characters as direct input and integrates expert knowledge contained in rules into the neural network, both contribute to the superior performance of proposed model for the text normalization task. We also release a first publicly accessible largescale dataset for Chinese text normalization. Our proposed model has achieved excellent results on this dataset.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Wenlin Dai (14 papers)
  2. Changhe Song (17 papers)
  3. Xiang Li (1003 papers)
  4. Zhiyong Wu (171 papers)
  5. Huashan Pan (1 paper)
  6. Xiulin Li (5 papers)
  7. Helen Meng (204 papers)
Citations (2)

Summary

Overview of Chinese Text Normalization with Rule-Guided Flat-Lattice Transformer

The paper "An end-to-end Chinese text normalization model based on Rule-guided Flat-Lattice Transformer" introduces a novel model for text normalization (TN) in Chinese language processing, catering specifically to the requirements of text-to-speech (TTS) systems. Text normalization involves transforming non-standard words (NSWs) like numerals and symbols into standard spoken-form words. The paper addresses the limitations of traditional rule-based and sequence-to-sequence (seq2seq) neural network approaches, presenting instead an integration of the Flat-Lattice Transformer (FLAT) architecture with expert rule-driven insights, which enhances the model's performance in dealing with contextual ambiguities.

Methodological Innovation

The proposed method improves upon two established approaches in text normalization: rule-based strategies, which are often rigid and context-blind, and seq2seq models, which may introduce unexpected errors due to data or model biases. The traditional hybrid systems, which sequentially deploy rule-based and neural network models, are critiqued for their inefficiencies in leveraging interactions between these components.

This research introduces a unique end-to-end solution using the Flat-Lattice Transformer, adapted from its original application in Chinese named entity recognition. The Flat-Lattice Transformer enables the incorporation of word information directly into a neural network without needing a separate word segmentation module. It can map characters and matched words into a lattice structure which is then processed in spans, significantly simplifying the integration of lexical rules as tokens and allowing for seamless expert knowledge injection into the model.

Model Evaluation and Performance

The model's effectiveness was tested on a newly released large-scale Chinese text normalization dataset—a resource that enhances replicability and sets a new benchmark for Chinese TN tasks. This dataset is characterized by a comprehensive taxonomy of NSW categories, enabling detailed performance analysis across different text forms.

In terms of numerical results, the model demonstrated superior accuracy and F1-score compared to baseline methods that included rules-based systems, BERT-MLP, and BERT-LSTM configurations. The Flat-Lattice Transformer model achieved an impressive 99.1% accuracy in NSW classification, showcasing the efficacy of its architecture in leveraging both contextual and rule-based information.

Broader Implications and Future Prospects

This paper provides a significant step forward in the development of models capable of handling complex language tasks, particularly for agglutinative and logographic languages such as Chinese. The integration of rule-based heuristics with deep learning architectures potentially serves as a template for similar applications across other languages and tasks within NLP.

In terms of implications, this research could prompt further exploration into rule-guided neural network combinations, enhancing the interpretability and controllability of autopilot NLP systems. Future developments might expand upon this foundation by incorporating more complex rule systems and evaluating their interactions with different neural architectures to further optimize model performance in diverse linguistic settings.

The open-source dataset accompanying this research provides a valuable resource for continuing advancements in TN for Chinese and potentially beyond, making such integrative models not only more efficient but also more universally applicable. As the field of AI continues to evolve, the merging of traditional linguistic insights with modern computational power holds great promise for more sophisticated and accurate speech synthesis and other area-specific tasks.

Github Logo Streamline Icon: https://streamlinehq.com