Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Review of Hybrid and Ensemble in Deep Learning for Natural Language Processing (2312.05589v2)

Published 9 Dec 2023 in cs.AI

Abstract: This review presents a comprehensive exploration of hybrid and ensemble deep learning models within NLP, shedding light on their transformative potential across diverse tasks such as Sentiment Analysis, Named Entity Recognition, Machine Translation, Question Answering, Text Classification, Generation, Speech Recognition, Summarization, and LLMing. The paper systematically introduces each task, delineates key architectures from Recurrent Neural Networks (RNNs) to Transformer-based models like BERT, and evaluates their performance, challenges, and computational demands. The adaptability of ensemble techniques is emphasized, highlighting their capacity to enhance various NLP applications. Challenges in implementation, including computational overhead, overfitting, and model interpretation complexities, are addressed alongside the trade-off between interpretability and performance. Serving as a concise yet invaluable guide, this review synthesizes insights into tasks, architectures, and challenges, offering a holistic perspective for researchers and practitioners aiming to advance language-driven applications through ensemble deep learning in NLP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (128)
  1. Karen Sparck Jones. Natural language processing: A historical review. In Current Issues in Computational Linguistics: In Honour of Don Walker, pages 3–16. Springer Netherlands, 1994.
  2. Elizabeth DuRoss Liddy. Anaphora in natural language processing and information retrieval. Information Processing & Management, 26(1):39–52, 1990.
  3. D. Jurafsky and J. H. Martin. Speech and Language Processing. 2014.
  4. W. J. Hutchins. Machine translation: A brief history, pages 431–445. Pergamon, 1995.
  5. Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18(5):544–551, 2011.
  6. Alan Mathison Turing. Mind. Mind, 59(236):433–460, 1950.
  7. Introduction to Information Retrieval. Cambridge University Press, 2008.
  8. L. Deng and D. Yu. Deep learning: methods and applications. Foundations and trends® in signal processing, 7(3–4):197–387, 2014.
  9. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
  10. Deep learning. MIT press, 2016.
  11. Attention is all you need. pages 5998–6008, 2017.
  12. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
  13. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146, 2018.
  14. Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine, 13(3):55–75, 2018.
  15. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171–4186, 2019.
  16. T. G. Dietterich. Ensemble methods in machine learning. In International workshop on multiple classifier systems, pages 1–15. Springer Berlin Heidelberg, 2000.
  17. D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11:169–198, 1999.
  18. Zhi-Hua Zhou. Ensemble methods: foundations and algorithms. CRC press, 2012.
  19. Bagging and deep learning in optimal individualized treatment rules. Biometrics, 75(2):674–684, 2019.
  20. L. Kumar and D. P. Sinha. From cmp to crs: an overview of stacking techniques of seismic data. In 7th Biennial international conference and exposition on petroleum geophysics, volume 414, 2008.
  21. Leo Breiman. Bagging predictors. Machine learning, 24:123–140, 1996.
  22. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
  23. David H Wolpert. Stacked generalization. Neural networks, 5(2):241–259, 1992.
  24. L. Kurniasari and A. Setyanto. Sentiment analysis using recurrent neural network. In Journal of Physics: Conference Series, volume 1471, page 012018. IOP Publishing, 2020.
  25. Explaining recurrent neural network predictions in sentiment analysis. arXiv preprint arXiv:1706.07206, 2017.
  26. A. Patel and A. K. Tiwari. Sentiment analysis by using recurrent neural network. In Proceedings of 2nd International Conference on Advanced Computing and Software Engineering (ICACSE), February 2019.
  27. J. L. Elman. Finding structure in time. Cognitive science, 14(2):179–211, 1990.
  28. A. Graves. Supervised sequence labelling with recurrent neural networks. Springer Science & Business Media, 2012.
  29. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019, 2015.
  30. Understanding hidden memories of recurrent neural networks. In 2017 IEEE conference on visual analytics science and technology (VAST), pages 13–24. IEEE, 2017.
  31. P. J. Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990.
  32. J. V. Tembhurne and T. Diwan. Sentiment analysis in textual, visual and multimodal inputs using recurrent neural networks. Multimedia Tools and Applications, 80:6871–6910, 2021.
  33. Arabic sentiment analysis using recurrent neural networks: a review. Artificial Intelligence Review, 55(1):707–748, 2022.
  34. S. Mao and E. Sejdić. A review of recurrent neural network-based methods in computational physiology. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  35. Character-level convolutional networks for text classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, 2018.
  36. K. Baktha and B. K. Tripathy. Investigation of recurrent neural networks in the field of sentiment analysis. In 2017 International Conference on Communication and Signal Processing (ICCSP), pages 2047–2050. IEEE, 2017.
  37. Sentiment analysis using gated recurrent neural networks. SN Computer Science, 1:1–13, 2020.
  38. A review on recent advances in deep learning for sentiment analysis: Performances, challenges and limitations. Compusoft, 9(7):3775–3783, 2020.
  39. Datastories at semeval-2017 task 4: Deep lstm with attention for message-level and topic-based sentiment analysis. In Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pages 747–754, 2017.
  40. N. Maheswaranathan and D. Sussillo. How recurrent networks implement contextual processing in sentiment analysis. arXiv preprint arXiv:2004.08013, 2020.
  41. Contextual sentiment neural network for document sentiment analysis. Data Science and Engineering, 5:180–192, 2020.
  42. A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems, 226:107134, 2021.
  43. Sentiment analysis on twitter data by using convolutional neural network (cnn) and long short term memory (lstm). Wireless Personal Communications, pages 1–10, 2021.
  44. S. Rani and P. Kumar. Deep learning based sentiment analysis using convolution neural network. Arabian Journal for Science and Engineering, 44:3305–3314, 2019.
  45. Y. Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.
  46. Visual and textual sentiment analysis of a microblog using deep convolutional neural networks. Algorithms, 9(2):41, 2016.
  47. H. Kim and Y. S. Jeong. Sentiment classification using convolutional neural networks. Applied Sciences, 9(11):2347, 2019.
  48. Deep learning. Nature, 521(7553):436–444, 2015.
  49. Using long short-term memory deep neural networks for aspect-based sentiment analysis of arabic reviews. International Journal of Machine Learning and Cybernetics, 10:2163–2175, 2019.
  50. Character-level convolutional networks for text classification. 28, 2015.
  51. Sentiment analysis using word2vec and long short-term memory (lstm) for indonesian hotel reviews. In Procedia Computer Science, volume 179, pages 728–735, 2021.
  52. Habitat-matterport 3d semantics dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4927–4936, 2023.
  53. F. Miedema and S. Bhulai. Sentiment analysis with long short-term memory networks. Vrije Universiteit Amsterdam, 1:1–17, 2018.
  54. J. Shobana and M. Murali. An efficient sentiment analysis methodology based on long short-term memory networks. Complex & Intelligent Systems, 7(5):2485–2501, 2021.
  55. Understanding lstm–a tutorial into long short-term memory recurrent neural networks. arXiv preprint arXiv:1909.09586, 2019.
  56. C. Olah. Understanding lstm networks, 2015.
  57. H. Fei and F. Tan. Bidirectional grid long short-term memory (bigridlstm): A method to address context-sensitivity and vanishing gradient. Algorithms, 11(11):172, 2018.
  58. A. Hassan and A. Mahmood. Deep learning approach for sentiment analysis of short texts. In 2017 3rd international conference on control, automation and robotics (ICCAR), pages 705–710. IEEE, 2017.
  59. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  60. S. Alaparthi and M. Mishra. Bidirectional encoder representations from transformers (bert): A sentiment analysis odyssey. arXiv preprint arXiv:2007.01127, 2020.
  61. M. D. Deepa. Bidirectional encoder representations from transformers (bert) language model for sentiment analysis task. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(7):1708–1721, 2021.
  62. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  63. Utilizing bert for aspect-based sentiment analysis via constructing auxiliary sentence. arXiv preprint arXiv:1903.09588, 2019.
  64. N. Zainuddin and A. Selamat. Sentiment analysis using support vector machine. In 2014 international conference on computer, communications, and control technology (I4CT), pages 333–337. IEEE, 2014.
  65. Sentiment analysis using support vector machine. International Journal of Innovative Research in Computer and Communication Engineering, 2(1):2607–2612, 2014.
  66. Sentiment analysis using support vector machine. In 2019 International conference on contemporary computing and informatics (IC3I), pages 49–53. IEEE, 2019.
  67. T. Mullen and N. Collier. Sentiment analysis using support vector machines with diverse information sources. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 412–418, 2004.
  68. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing, pages 79–86, 2002.
  69. Application of support vector machine (svm) in the sentiment analysis of twitter dataset. Applied Sciences, 10(3):1125, 2020.
  70. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, volume 1, pages 142–150, 2011.
  71. An evaluation tool for machine translation: Fast evaluation for mt research. In LREC, May 2000.
  72. D. Kenny. Machine translation, pages 428–445. Routledge, 2018.
  73. Sequence to sequence learning with neural networks. pages 3104–3112, 2014.
  74. M. Hearne and A. Way. Statistical machine translation: a guide for linguists and translators. Language and Linguistics Compass, 5(5):205–226, 2011.
  75. A. Lopez. Statistical machine translation. ACM Computing Surveys (CSUR), 40(3):1–49, 2008.
  76. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–311, 1993.
  77. Mtil2017: Machine translation using recurrent neural network on statistical machine translation. Journal of Intelligent Systems, 28(3):447–453, 2019.
  78. Taking statistical machine translation to the student translator. 2012.
  79. V. Bakarola and J. Nasriwala. Attention based sequence to sequence learning for machine translation of low resourced indic languages–a case of sanskrit to hindi. arXiv preprint arXiv:2110.00435, 2021.
  80. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
  81. Sequence-to-sequence neural machine translation for english-malay. IAES International Journal of Artificial Intelligence, 11(2):658, 2022.
  82. Transformer in transformer. Advances in Neural Information Processing Systems, 34:15908–15919, 2021.
  83. Sequence-to-sequence models for emphasis speech translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(10):1873–1883, 2018.
  84. Flowformer: A transformer architecture for optical flow. In European Conference on Computer Vision, pages 668–685. Springer Nature Switzerland, 2022.
  85. Natural language question answering: the view from here. natural language engineering, 7(4):275–300, 2001.
  86. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
  87. An information retrieval-based approach to table-based question answering. In Natural Language Processing and Chinese Computing: 6th CCF International Conference, NLPCC 2017, pages 601–611. Springer International Publishing, 2018.
  88. A. Sagrado Sala. Master dissertation: Information retrieval for question answering based on distributed representations. Master’s thesis, 2022.
  89. The question answering systems: A survey. International Journal of Research and Reviews in Information Sciences (IJRRIS), 2(3), 2012.
  90. B. Ojokoh and E. Adebisi. A review of question answering systems. Journal of Web Engineering, 17(8):717–758, 2018.
  91. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
  92. Attention-based encoder-decoder model for answer selection in question answering. Frontiers of Information Technology & Electronic Engineering, 18(4):535–544, 2017.
  93. A question answering system with a sequence to sequence grammatical correction. In Proceedings of the 3rd International Conference on Networking, Information Systems & Security, pages 1–6, 2020.
  94. Visual question answering via attention-based syntactic structure tree-lstm. Applied Soft Computing, 82:105584, 2019.
  95. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3–26, 2007.
  96. Message understanding conference-6: A brief history. In COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, 1996.
  97. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, pages 282–289, 2001.
  98. B. Mohit. Named entity recognition. In Natural language processing of semitic languages, pages 221–245. Springer Berlin Heidelberg, 2014.
  99. An overview of named entity recognition. In 2018 International Conference on Asian Language Processing (IALP), pages 273–278. IEEE, 2018.
  100. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1):50–70, 2020.
  101. Named entity recognition using conditional random fields. Procedia Computer Science, 167:1181–1188, 2020.
  102. Named entity recognition based on conditional random fields. Cluster Computing, 22:5195–5206, 2019.
  103. Named entity recognition using conditional random fields. Applied Sciences, 12(13):6391, 2022.
  104. A conditional random fields approach to clinical name entity recognition. In CCKS tasks, pages 1–6, 2018.
  105. Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991, 2015.
  106. Named entity recognition using bert bilstm crf for chinese electronic health records. In 2019 12th International Congress on Image and Signal Processing, Biomedical Engineering and Informatics (CISP-BMEI), pages 1–5. IEEE, 2019.
  107. Named entity recognition by using xlnet-bilstm-crf. Neural Processing Letters, 53(5):3339–3356, 2021.
  108. Z. Chenhao and W. Chengyao. Named entity recognition in steel field based on bilstm-crf model. In Journal of Physics: Conference Series, volume 1314, page 012217. IOP Publishing, 2019.
  109. X. Ma and E. Hovy. End-to-end sequence labeling via bi-directional lstm-cnns-crf. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1064–1074, 2016.
  110. Named entity recognition in cyber threat intelligence using transformer-based models. In 2021 IEEE International Conference on Cyber Security and Resilience (CSR), pages 348–353. IEEE, 2021.
  111. Transformer-based approach for joint handwriting and named entity recognition in historical document. Pattern Recognition Letters, 155:128–134, 2022.
  112. Transformer based named entity recognition for place name extraction from unstructured text. International Journal of Geographical Information Science, 37(4):747–766, 2023.
  113. C. Chelba and F. Jelinek. Structured language modeling. Computer Speech & Language, 14(4):283–332, 2000.
  114. X. Liu and W. B. Croft. Statistical language modeling for information retrieval. Annu. Rev. Inf. Sci. Technol., 39(1):1–31, 2005.
  115. A survey on the application of recurrent neural networks to statistical language modeling. Computer Speech & Language, 30(1):61–98, 2015.
  116. An empirical exploration of recurrent network architectures. In International conference on machine learning, pages 2342–2350. PMLR, 2015.
  117. Compression of recurrent neural networks for efficient language modeling. Applied Soft Computing, 79:354–362, 2019.
  118. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
  119. K. Cho et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  120. Codegru: Context-aware deep learning with gated recurrent unit for source code modeling. Information and Software Technology, 125:106309, 2020.
  121. Preventing gradient explosions in gated recurrent units. Advances in neural information processing systems, 30, 2017.
  122. T. Khai Tran and T. Thi Phan. Deep learning application to ensemble learning—the simple, but effective, approach to sentiment classifying. Applied Sciences, 9(13):2760, 2019.
  123. Artificial intelligence and machine learning in emergency medicine: a narrative review. Acute medicine & surgery, 9(1):e740, 2022.
  124. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015.
  125. L. K. Hansen and P. Salamon. Neural network ensembles. IEEE transactions on pattern analysis and machine intelligence, 12(10):993–1001, 1990.
  126. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
  127. A. Gupta and A. Singhal. Scaling and performance of the apache solr search engine. In Proceedings of the 16th international conference on World Wide Web, 2011.
  128. C. Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jianguo Jia (5 papers)
  2. Wen Liang (13 papers)
  3. Youzhi Liang (12 papers)
Citations (13)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets