Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Supplementary Features of BiLSTM for Enhanced Sequence Labeling (2305.19928v4)

Published 31 May 2023 in cs.CL

Abstract: Sequence labeling tasks require the computation of sentence representations for each word within a given sentence. A prevalent method incorporates a Bi-directional Long Short-Term Memory (BiLSTM) layer to enhance the sequence structure information. However, empirical evidence Li (2020) suggests that the capacity of BiLSTM to produce sentence representations for sequence labeling tasks is inherently limited. This limitation primarily results from the integration of fragments from past and future sentence representations to formulate a complete sentence representation. In this study, we observed that the entire sentence representation, found in both the first and last cells of BiLSTM, can supplement each the individual sentence representation of each cell. Accordingly, we devised a global context mechanism to integrate entire future and past sentence representations into each cell's sentence representation within the BiLSTM framework. By incorporating the BERT model within BiLSTM as a demonstration, and conducting exhaustive experiments on nine datasets for sequence labeling tasks, including named entity recognition (NER), part of speech (POS) tagging, and End-to-End Aspect-Based sentiment analysis (E2E-ABSA). We noted significant improvements in F1 scores and accuracy across all examined datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. P.-H. Li, T.-J. Fu, and W.-Y. Ma, “Why attention? analyze bilstm deficiency and its remedies in the case of ner,” vol. 34, 2020, pp. 8236–8244.
  2. M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
  3. A. Ghaddar and P. Langlais, “Robust lexical features for improved neural network named-entity recognition,” arXiv preprint arXiv:1806.03489, 2018.
  4. X. Ma and E. Hovy, “End-to-end sequence labeling via bi-directional lstm-cnns-crf,” arXiv preprint arXiv:1603.01354, 2016.
  5. B. Plank, A. Søgaard, and Y. Goldberg, “Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss,” arXiv preprint arXiv:1604.05529, 2016.
  6. L. Chen, W. Ruan, X. Liu, and J. Lu, “Seqvat: Virtual adversarial training for semi-supervised sequence labeling,” 2020, pp. 8801–8811.
  7. L. Xu, Z. Jie, W. Lu, and L. Bing, “Better feature integration for named entity recognition,” arXiv preprint arXiv:2104.05316, 2021.
  8. X. Li, L. Bing, W. Zhang, and W. Lam, “Exploiting bert for end-to-end aspect-based sentiment analysis,” arXiv preprint arXiv:1910.00883, 2019.
  9. H. Lin, S. Zhang, Q. Li, Y. Li, J. Li, and Y. Yang, “A new method for heart rate prediction based on lstm-bilstm-att,” Measurement, vol. 207, p. 112384, 2023.
  10. M. Méndez, M. G. Merayo, and M. Núñez, “Long-term traffic flow forecasting using a hybrid cnn-bilstm model,” Engineering Applications of Artificial Intelligence, vol. 121, p. 106041, 2023.
  11. R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct deep recurrent neural networks,” arXiv preprint arXiv:1312.6026, 2013.
  12. F. Meng and J. Zhang, “Dtmt: A novel deep transition architecture for neural machine translation,” vol. 33, 2019, pp. 224–231.
  13. Y. Liu, F. Meng, J. Zhang, J. Xu, Y. Chen, and J. Zhou, “Gcdt: A global context enhanced deep transition architecture for sequence labeling,” arXiv preprint arXiv:1906.02437, 2019.
  14. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  15. Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances in neural information processing systems, vol. 32, 2019.
  16. J. Sarzynska-Wawer, A. Wawer, A. Pawlak, J. Szymanowska, I. Stefaniak, M. Jarkiewicz, and L. Okruszek, “Detecting formal thought disorder by deep contextualized word representations,” Psychiatry Research, vol. 304, p. 114135, 2021.
  17. Z. Jie and W. Lu, “Dependency-guided lstm-crf for named entity recognition,” arXiv preprint arXiv:1909.10148, 2019.
  18. Y. Labrak and R. Dufour, “Antilles: An open french linguistically enriched part-of-speech corpus.”   Springer, 2022, pp. 28–38.
  19. A. Akbik, T. Bergmann, D. Blythe, K. Rasul, S. Schweter, and R. Vollgraf, “Flair: An easy-to-use framework for state-of-the-art nlp,” 2019, pp. 54–59.
  20. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, pp. 1735–1780, 1997.
  21. H. Chen, Z. Lin, G. Ding, J. Lou, Y. Zhang, and B. Karlsson, “Grn: Gated relation network to enhance convolutional neural network for named entity recognition,” vol. 33, 2019, pp. 6236–6243.
  22. D. Zeng, Y. Dai, F. Li, J. Wang, and A. K. Sangaiah, “Aspect based sentiment analysis by a linguistically regularized cnn with gated mechanism,” Journal of Intelligent & Fuzzy Systems, vol. 36, pp. 3971–3980, 2019.
  23. J. Yuan, H.-C. Xiong, Y. Xiao, W. Guan, M. Wang, R. Hong, and Z.-Y. Li, “Gated cnn: Integrating multi-scale feature layers for object detection,” Pattern Recognition, vol. 105, p. 107131, 2020.
  24. X. Zeng, W. Ouyang, B. Yang, J. Yan, and X. Wang, “Gated bi-directional cnn for object detection.”   Springer, 2016, pp. 354–369.
  25. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  26. J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” 2014, pp. 1532–1543.
  27. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
  28. M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar, “SemEval-2014 task 4: Aspect based sentiment analysis,” in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014).   Dublin, Ireland: Association for Computational Linguistics, Aug. 2014, pp. 27–35. [Online]. Available: https://aclanthology.org/S14-2004
  29. M. Pontiki, D. Galanis, H. Papageorgiou, S. Manandhar, and I. Androutsopoulos, “Semeval-2015 task 12: Aspect based sentiment analysis,” 2015, pp. 486–495.
  30. M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, S. Manandhar, M. AL-Smadi, M. Al-Ayyoub, Y. Zhao, B. Qin, and O. D. Clercq, “Semeval-2016 task 5: Aspect based sentiment analysis.”   Association for Computational Linguistics, 2016, pp. 19–30.
  31. X. Li, L. Bing, P. Li, and W. Lam, “A unified model for opinion target extraction and target sentiment prediction,” vol. 33, 2019, pp. 6714–6721.
  32. E. F. Sang and F. D. Meulder, “Introduction to the conll-2003 shared task: Language-independent named entity recognition,” arXiv preprint cs/0306050, 2003.
  33. L. Derczynski, E. Nichols, M. V. Erp, and N. Limsopatham, “Results of the wnut2017 shared task on novel and emerging entity recognition,” 2017, pp. 140–147.
  34. N. Peng and M. Dredze, “Named entity recognition for chinese social media with jointly trained embeddings,” 2015, pp. 548–554.
  35. N. Silveira, T. Dozat, M.-C. D. Marneffe, S. R. Bowman, M. Connor, J. Bauer, and C. D. Manning, “A gold standard dependency corpus for english.”   Citeseer, 2014, pp. 2897–2904.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Conglei Xu (1 paper)
  2. Kun Shen (1 paper)
  3. Hongguang Sun (21 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.