An Effective Incorporating Heterogeneous Knowledge Curriculum Learning for Sequence Labeling (2402.13534v2)
Abstract: Sequence labeling models often benefit from incorporating external knowledge. However, this practice introduces data heterogeneity and complicates the model with additional modules, leading to increased expenses for training a high-performing model. To address this challenge, we propose a two-stage curriculum learning (TCL) framework specifically designed for sequence labeling tasks. The TCL framework enhances training by gradually introducing data instances from easy to hard, aiming to improve both performance and training speed. Furthermore, we explore different metrics for assessing the difficulty levels of sequence labeling tasks. Through extensive experimentation on six Chinese word segmentation (CWS) and Part-of-speech tagging (POS) datasets, we demonstrate the effectiveness of our model in enhancing the performance of sequence labeling models. Additionally, our analysis indicates that TCL accelerates training and alleviates the slow training problem associated with complex models.
- Active learning approach using a modified least confidence sampling strategy for named entity recognition. Progress in Artificial Intelligence, 10(2):113–128.
- Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, page 41–48, New York, NY, USA. Association for Computing Machinery.
- Wray L. Buntine and Andreas S. Weigend. 1991. Bayesian back-propagation. Complex Systems.
- A feature-enriched neural model for joint chinese word segmentation and part-of-speech tagging. In IJCAI.
- Aron Culotta and Andrew McCallum. 2005. Reducing Labeling Effort for Structured Prediction Tasks:. Fort Belvoir, VA.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
- Rethink cws: Is chinese word segmentation a solved task? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5676–5686, Online. Association for Computational Linguistics.
- Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. (arXiv:1506.02142). ArXiv:1506.02142 [cs, stat].
- Switch-lstms for multi-criteria chinese word segmentation. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 6457–6464. AAAI Press.
- A lexicon-based graph neural network for chinese ner. page 11.
- A coarse-to-fine labeling framework for joint word segmentation, pos tagging, and constituent parsing. In Proceedings of the 25th Conference on Computational Natural Language Learning, page 290–299, Online. Association for Computational Linguistics.
- Lexicon-based graph convolutional network for chinese word segmentation. In Findings of the Association for Computational Linguistics: EMNLP 2021, page 2908–2917, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Guangjin Jin and Xiao Chen. 2008. The fourth international Chinese language processing bakeoff: Chinese word segmentation, named entity recognition and Chinese POS tagging. In Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing.
- Lexicon enhanced chinese sequence labeling using bert adapter. arXiv:2105.07148 [cs]. ArXiv: 2105.07148.
- Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692 [cs]. ArXiv: 1907.11692.
- Applying natural annotation and curriculum learning to named entity recognition for under-resourced languages. page 4468–4480, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Data selection curriculum for neural machine translation. (arXiv:2203.13867). ArXiv:2203.13867 [cs].
- Joint chinese word segmentation and part-of-speech tagging via two-stage span labeling. arXiv:2112.09488 [cs]. ArXiv: 2112.09488.
- Borrowing wisdom from world: modeling rich external knowledge for chinese named entity recognition. Neural Computing and Applications, 34(6):4905–4922.
- A concise model for multi-criteria chinese word segmentation with transformer encoder. In Findings of the Association for Computational Linguistics: EMNLP 2020, page 2887–2897. Association for Computational Linguistics.
- Character-based joint segmentation and pos tagging for chinese using bidirectional rnn-crf.
- Deep active learning for named entity recognition. (arXiv:1707.05928). ArXiv:1707.05928 [cs].
- That slepen al the nyght with open ye! cross-era sequence segmentation with switch-memory. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), page 7830–7840, Dublin, Ireland. Association for Computational Linguistics.
- Chinese word segmentation with heterogeneous graph neural network. CoRR, abs/2201.08975.
- Incorporating deep syntactic and semantic knowledge for chinese sequence labeling with gcn. (arXiv:2306.02078). ArXiv:2306.02078 [cs].
- Joint Chinese word segmentation and part-of-speech tagging via two-way attentions of auto-analyzed knowledge. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8286–8296, Online. Association for Computational Linguistics.
- Joint Chinese word segmentation and part-of-speech tagging via multi-channel attention of character n-grams. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2073–2084, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Self-paced learning for neural machine translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 1074–1080, Online. Association for Computational Linguistics.
- Fedcl: Federated multi-phase curriculum learning to synchronously correlate user heterogeneity. (arXiv:2211.07248). ArXiv:2211.07248 [cs].
- Improving back-translation with uncertainty-based confidence estimation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), page 791–802, Hong Kong, China. Association for Computational Linguistics.
- A survey on curriculum learning. (arXiv:2010.13166). ArXiv:2010.13166 [cs].
- Generative entity typing with curriculum learning. (arXiv:2210.02914). ArXiv:2210.02914 [cs].
- A simple and effective neural model for joint word segmentation and pos tagging. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(9):1528–1538.
- Type-supervised domain adaptation for joint segmentation and POS-tagging. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 588–597, Gothenburg, Sweden. Association for Computational Linguistics.
- Improving imbalanced text classification with dynamic curriculum learning. (arXiv:2210.14724). ArXiv:2210.14724 [cs].
- Encoding multi-granularity structural information for joint chinese word segmentation and pos tagging. Pattern Recognition Letters, 138:163–169.
- Combining curriculum learning and knowledge distillation for dialogue generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, page 1284–1295, Punta Cana, Dominican Republic. Association for Computational Linguistics.