Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks (2306.07303v1)

Published 11 Jun 2023 in cs.LG and cs.CL

Abstract: Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. Unlike conventional neural networks or updated versions of Recurrent Neural Networks (RNNs) such as Long Short-Term Memory (LSTM), transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. As a result, transformer-based models have attracted substantial interest among researchers in the field of artificial intelligence. This can be attributed to their immense potential and remarkable achievements, not only in NLP tasks but also in a wide range of domains, including computer vision, audio and speech processing, healthcare, and the Internet of Things (IoT). Although several survey papers have been published highlighting the transformer's contributions in specific fields, architectural differences, or performance evaluations, there is still a significant absence of a comprehensive survey paper encompassing its major applications across various domains. Therefore, we undertook the task of filling this gap by conducting an extensive survey of proposed transformer models from 2017 to 2022. Our survey encompasses the identification of the top five application domains for transformer-based models, namely: NLP, Computer Vision, Multi-Modality, Audio and Speech Processing, and Signal Processing. We analyze the impact of highly influential transformer-based models in these domains and subsequently classify them based on their respective tasks using a proposed taxonomy. Our aim is to shed light on the existing potential and future possibilities of transformers for enthusiastic researchers, thus contributing to the broader understanding of this groundbreaking technology.

An Analysis of the Transformational Impact of Transformers Across Deep Learning Domains

The paper "A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks" offers an extensive examination of the pivotal role transformers have played in multiple deep learning contexts since their inception. Initially developed for NLP, transformers have leveraged their capacity for handling long-term dependencies and parallel processing to establish a significant presence in various fields, including computer vision, audio and speech processing, and beyond. This paper engages in a systematic exploration of transformers' contributions, categorizing them into five primary application domains: NLP, computer vision, multi-modality, audio and speech, and signal processing.

Natural Language Processing: Expanded Boundaries

The research recognizes NLP as the initial frontier that transformers revolutionized with models like BERT and GPT becoming staples for tasks ranging from language translation to sentiment analysis. In particular, the survey highlights how transformers have facilitated significant advances in tasks such as text generation and question answering. Models such as PEGASUS for abstractive summarization and T5 for multi-task learning underscore transformers' versatility in handling complex linguistic challenges.

Computer Vision: Redefining Image Analysis

In computer vision, transformers have provided a compelling alternative to convolutional neural networks (CNNs), proving adept at tasks such as image recognition and segmentation. Vision Transformer (ViT) and its variants are acknowledged for shifting paradigms by treating image classification more akin to NLP tasks. The paper's review includes models enhancing medical image understanding, exemplified by applications in segmenting and classifying complex radiological images, emphasizing the impact across both natural and medical image domains.

Multi-Modality: Bridging Modal Barriers

The survey explores multi-modal tasks where transformers integrate text with other data types, such as images and video. Here, models like VisualBERT and CLIP leverage multi-head attention mechanisms to foster deeper cross-modal understanding, enabling sophisticated tasks like visual question answering and image captioning. This broadens the scope of AI applications in more integrative contexts, confirming transformers' potential as unifying architectures across modalities.

Audio and Speech Processing: Enhancing Recognition and Clarity

The paper also discusses audio and speech-related tasks, where transformers have addressed challenges in speech recognition and separation. Conformer and Wav2vec, noted for their ability to streamline feature extraction from audio inputs without the encumbrance of recurrent layers, have markedly improved speech processing frameworks. These models demonstrate transformers' alignment with current needs for speed and accuracy in real-time audio processing.

Signal Processing: Pioneering New Solutions

Central to the paper's analysis is the nascent yet promising application of transformers in signal processing tasks, particularly wireless network communication and cloud computing. Here, models adapt to the intricacies of signal type, demonstrating transformative potential in enhancing efficiency and precision in dynamic and data-rich environments.

Future Growth and Directions

The survey not only captures the breadth of transformers' cross-domain influence but also sets the stage for future research opportunities. Prospective areas include enhancements in cloud-native architectures and challenges unique to 5G/6G networks, alongside potential developments in generative tasks, which remain ripe for exploration. However, challenges such as the computational demands of large transformer models, data requirements, and model interpretability beckon continued research and innovation to optimize and extend transformer applications further.

In conclusion, this comprehensive survey underscores transformers' integral role in advancing AI across multifaceted deep learning tasks. By delineating current accomplishments and future challenges, it invites ongoing exploration into these architectures' full potential. As transformers continue to evolve, refinements and novel applications promise to further embed these models at the core of AI development, setting the stage for continued advancements in increasingly complex and intertwined domains of artificial intelligence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (213)
  1. Transformer models for text-based emotion detection: a review of bert-based approaches. Artif. Intell. Rev., 54, 5789–5829.
  2. Sit: Self-supervised vision transformer. CoRR, abs/2104.03602. URL: https://arxiv.org/abs/2104.03602. arXiv:2104.03602.
  3. VATT: transformers for multimodal self-supervised learning from raw video, audio and text. In M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, & J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, NeurIPS, December 6-14, 2021, virtual (pp. 24206--24221).
  4. VQA: visual question answering. In IEEE International Conference on Computer Vision, ICCV, Santiago, Chile, December 7-13 (pp. 2425--2433). IEEE Computer Society.
  5. Vivit: A video vision transformer. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021 (pp. 6816--6826). IEEE.
  6. XLS-R: self-supervised cross-lingual speech representation learning at scale. In H. Ko, & J. H. L. Hansen (Eds.), Interspeech, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September (pp. 2278--2282). ISCA.
  7. vq-wav2vec: Self-supervised learning of discrete speech representations. In 8th International Conference on Learning Representations, ICLR, Addis Ababa, Ethiopia, April 26-30. OpenReview.net.
  8. wav2vec 2.0: A framework for self-supervised learning of speech representations. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, NeurIPS, December 6-12, virtual.
  9. Beit: BERT pre-training of image transformers. In The Tenth International Conference on Learning Representations, ICLR Virtual Event, April 25-29. OpenReview.net.
  10. Comprehensive comparative study of multi-label classification methods. Expert Syst. Appl., 203, 117215.
  11. Visualizing transformers for NLP: A brief survey. In 24th International Conference on Information Visualisation, IV 2020, Melbourne, Australia, September 7-11, 2020 (pp. 270--279). IEEE.
  12. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS, December 6-12, virtual.
  13. End-to-end object detection with transformers. In A. Vedaldi, H. Bischof, T. Brox, & J. Frahm (Eds.), Computer Vision - ECCV - 16th European Conference, Glasgow, UK, August 23-28, Proceedings, Part I (pp. 213--229). Springer volume 12346 of Lecture Notes in Computer Science.
  14. Temporal convolutional networks and transformers for classifying the sleep stage in awake or asleep using pulse oximetry signals. J. Comput. Sci., 59, 101544.
  15. Constrained transformer network for ECG signal processing and arrhythmia classification. BMC Medical Informatics Decis. Mak., 21, 184.
  16. This looks like that: Deep learning for interpretable image recognition. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada (pp. 8928--8939).
  17. Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation. In H. Meng, B. Xu, & T. F. Zheng (Eds.), Interspeech, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October (pp. 2642--2646). ISCA.
  18. Decision transformer: Reinforcement learning via sequence modeling. In M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, & J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, NeurIPS, December 6-14, virtual (pp. 15084--15097).
  19. Artificial neural networks-based machine learning for wireless networks: A tutorial. IEEE Commun. Surv. Tutorials, 21, 3039--3071.
  20. Generative pretraining from pixels. In Proceedings of the 37th International Conference on Machine Learning, ICML, 13-18 July, Virtual Event (pp. 1691--1703). PMLR volume 119 of Proceedings of Machine Learning Research.
  21. Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE J. Sel. Top. Signal Process., 16, 1505--1518.
  22. Unispeech-sat: Universal speech representation learning with speaker aware pre-training. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Virtual and Singapore, 23-27 May (pp. 6152--6156). IEEE.
  23. A more effective ct synthesizer using transformers for cone-beam ct-guided adaptive radiotherapy. Frontiers in Oncology, 12.
  24. UNITER: universal image-text representation learning. In Computer Vision - ECCV - 16th European Conference, Glasgow, UK, August 23-28 (pp. 104--120). Springer volume 12375 of Lecture Notes in Computer Science.
  25. Natural language processing. Fundamentals of artificial intelligence, (pp. 603--649).
  26. ELECTRA: pre-training text encoders as discriminators rather than generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30. OpenReview.net.
  27. Transformers as soft reasoners over language. In C. Bessiere (Ed.), Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI (pp. 3882--3890). ijcai.org.
  28. Wireless power transfer for future networks: Signal processing, machine learning, computing, and sensing. IEEE J. Sel. Top. Signal Process., 15, 1060--1094.
  29. Unsupervised cross-lingual representation learning for speech recognition. In H. Hermansky, H. Cernocký, L. Burget, L. Lamel, O. Scharenborg, & P. Motlícek (Eds.), Interspeech, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September (pp. 2426--2430). ISCA.
  30. Cross-lingual language model pretraining. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, Vancouver, BC, Canada (pp. 7057--7067).
  31. Convit: Improving vision transformers with soft convolutional inductive biases. In M. Meila, & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning, ICML, 18-24 July, Virtual Event (pp. 2286--2296). PMLR volume 139 of Proceedings of Machine Learning Research.
  32. New types of deep neural network learning for speech recognition and related applications: an overview. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Vancouver, BC, Canada, May 26-31 (pp. 8599--8603). IEEE.
  33. BERT: pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, Volume 1 (Long and Short Papers) (pp. 4171--4186). Association for Computational Linguistics.
  34. Cogview: Mastering text-to-image generation via transformers. In M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, & J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, NeurIPS, December 6-14, virtual (pp. 19822--19835).
  35. Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Calgary, AB, Canada, April 15-20 (pp. 5884--5888). IEEE.
  36. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR, Virtual Event, Austria, May 3-7. OpenReview.net.
  37. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. CoRR, abs/2101.03961. URL: https://arxiv.org/abs/2101.03961. arXiv:2101.03961.
  38. A practical survey on faster and lighter transformers. CoRR, abs/2103.14636. URL: https://arxiv.org/abs/2103.14636. arXiv:2103.14636.
  39. VIOLET : End-to-end video-language transformers with masked visual-token modeling. CoRR, abs/2111.12681. URL: https://arxiv.org/abs/2111.12681. arXiv:2111.12681.
  40. Affectgan: Affect-based generative art driven by semantics. In 9th International Conference on Affective Computing and Intelligent Interaction, ACII - Workshops and Demos, Nara, Japan, September 28 - Oct. 1 (pp. 1--7). IEEE.
  41. Constructive learning of recurrent neural networks: limitations of recurrent cascade correlation and a simple solution. IEEE Trans. Neural Networks, 6, 829--836.
  42. AST: audio spectrogram transformer. In H. Hermansky, H. Cernocký, L. Burget, L. Lamel, O. Scharenborg, & P. Motlícek (Eds.), Interspeech, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September (pp. 571--575). ISCA.
  43. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18, 602--610.
  44. Deep transfer learning & beyond: Transformer language models in information systems research. ACM Comput. Surv., 54, 204:1--204:35.
  45. Chest l-transformer: local features with position attention for weakly supervised chest radiograph segmentation and classification. Frontiers in Medicine, (p. 1619).
  46. Speech intention classification with multimodal deep learning. In M. Mouhoub, & P. Langlais (Eds.), Advances in Artificial Intelligence - 30th Canadian Conference on Artificial Intelligence, Canadian AI, Edmonton, AB, Canada, May 16-19, Proceedings (pp. 260--271). volume 10233 of Lecture Notes in Computer Science.
  47. Conformer: Convolution-augmented transformer for speech recognition. In H. Meng, B. Xu, & T. F. Zheng (Eds.), Interspeech, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October (pp. 5036--5040). ISCA.
  48. CMT: convolutional neural networks meet vision transformers. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, New Orleans, LA, USA, June 18-24 (pp. 12165--12175). IEEE.
  49. Transformer-based high-frequency oscillation signal detection on magnetoencephalography from epileptic patients. Frontiers in Molecular Biosciences, 9.
  50. Teaching temporal logics to neural networks. In 9th International Conference on Learning Representations, ICLR, Virtual Event, Austria, May 3-7. OpenReview.net.
  51. Tedge-caching: Transformer-based edge caching towards 6g networks. In IEEE International Conference on Communications, ICC Seoul, Korea, May 16-20 (pp. 613--618). IEEE.
  52. Mcformer: A transformer based deep neural network for automatic modulation classification. In IEEE Global Communications Conference, GLOBECOM, Madrid, Spain, December 7-11 (pp. 1--6). IEEE.
  53. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell., 45, 87--110.
  54. Transformer in transformer. In M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, & J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, NeurIPS, December 6-14, virtual (pp. 15908--15919).
  55. Image segmentation techniques. Comput. Vis. Graph. Image Process., 29, 100--132.
  56. ADASYN: adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the International Joint Conference on Neural Networks, IJCNN 2008, part of the IEEE World Congress on Computational Intelligence, WCCI, Hong Kong, China, June 1-6 (pp. 1322--1328). IEEE.
  57. Fully transformer network for skin lesion analysis. Medical Image Anal., 77, 102357.
  58. Hénaff, O. J. (2020). Data-efficient image recognition with contrastive predictive coding. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (pp. 4182--4192). PMLR volume 119 of Proceedings of Machine Learning Research.
  59. Advances in natural language processing. Science, 349, 261--266.
  60. Natural language question answering: the view from here. Nat. Lang. Eng., 7, 275--300.
  61. Long short-term memory. Neural Comput., 9, 1735--1780.
  62. A comprehensive survey of deep learning for image captioning. ACM Comput. Surv., 51, 118:1--118:36.
  63. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE ACM Trans. Audio Speech Lang. Process., 29, 3451--3460.
  64. Segmentation from natural language expressions. In B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I (pp. 108--124). Springer volume 9905 of Lecture Notes in Computer Science.
  65. Deep learning for monaural speech separation. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Florence, Italy, May 4-9 (pp. 1562--1566). IEEE.
  66. Pixel-bert: Aligning image pixels with text by deep multi-modal transformers. CoRR, abs/2004.00849. URL: https://arxiv.org/abs/2004.00849. arXiv:2004.00849.
  67. Explainable transformer-based deep learning model for the detection of malaria parasites from blood cell images. Sensors, 22, 4358.
  68. Offline reinforcement learning as one big sequence modeling problem. In M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, & J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, NeurIPS, December 6-14, virtual (pp. 1273--1286).
  69. Deep learning architectures in emerging cloud computing architectures: Recent development, challenges and next research trend. Appl. Soft Comput., 96, 106582.
  70. Scaling up visual and vision-language representation learning with noisy text supervision. In M. Meila, & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning, ICML, 18-24 July, Virtual Event (pp. 4904--4916). PMLR volume 139 of Proceedings of Machine Learning Research.
  71. Mtpa_unet: Multi-scale transformer-position attention retinal vessel segmentation network joint transformer and CNN. Sensors, 22, 4592.
  72. Swinbts: A method for 3d multimodal brain tumor segmentation using swin transformer. Brain Sciences, 12, 797.
  73. A survey of deep learning-based object detection. IEEE Access, 7, 128837--128868.
  74. An end-to-end framework combining time-frequency expert knowledge and modified transformer networks for vibration signal classification. Expert Syst. Appl., 171, 114570.
  75. Vision transformer in stenosis detection of coronary arteries. Expert Syst. Appl., 228, 120234.
  76. Kaliyar, R. K. (2020). A multi-layer bidirectional transformer encoder for pre-trained word embedding: A survey of bert. 2020 10th International Conference on Cloud Computing, Data Science & Engineering, .
  77. CTRL: A conditional transformer language model for controllable generation. CoRR, abs/1909.05858. URL: http://arxiv.org/abs/1909.05858. arXiv:1909.05858.
  78. DeepGene transformer: Transformer for the gene expression-based classification of cancer subtypes. Expert Syst. Appl., 226, 120047.
  79. Transformers in vision: A survey. ACM Comput. Surv., 54, 200:1--200:41.
  80. HOTR: end-to-end human-object interaction detection with transformers. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, virtual, June 19-25 (pp. 74--83). Computer Vision Foundation / IEEE.
  81. Vilt: Vision-and-language transformer without convolution or region supervision. In M. Meila, & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning, ICML, 18-24 July, Virtual Event (pp. 5583--5594). PMLR volume 139 of Proceedings of Machine Learning Research.
  82. Kuhn, T. (2014). A survey and classification of controlled natural languages. Comput. Linguistics, 40, 121--170.
  83. Can q-learning with graph networks learn a generalizable branching heuristic for a SAT solver? In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, NeurIPS, December 6-12, virtual.
  84. Towards javascript program repair with generative pre-trained transformer (GPT-2). In 3rd IEEE/ACM International Workshop on Automated Program Repair, APR@ICSE, Pittsburgh, PA, USA, May 19 (pp. 61--68). IEEE.
  85. Flaubert: Unsupervised language model pre-training for french. In Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11-16, 2020 (pp. 2479--2490). European Language Resources Association.
  86. Knowledge distillation-based deep learning classification network for peripheral blood leukocytes. Biomed. Signal Process. Control., 75, 103590.
  87. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In D. Jurafsky, J. Chai, N. Schluter, & J. R. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 (pp. 7871--7880). Association for Computational Linguistics.
  88. Unicoder-vl: A universal encoder for vision and language by cross-modal pre-training. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI, New York, NY, USA, February 7-12 (pp. 11336--11344). AAAI Press.
  89. Transforming medical imaging with transformers? a comparative review of key properties, current progresses, and future perspectives. Medical image analysis, (p. 102762).
  90. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu, & S. Sabato (Eds.), International Conference on Machine Learning, ICML, 17-23 July, Baltimore, Maryland, USA (pp. 12888--12900). PMLR volume 162 of Proceedings of Machine Learning Research.
  91. Why attention? analyze bilstm deficiency and its remedies in the case of NER. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA, February 7-12 (pp. 8236--8244). AAAI Press.
  92. Multi-key privacy-preserving deep learning in cloud computing. Future Gener. Comput. Syst., 74, 76--85.
  93. Chimera: efficiently training large-scale neural networks with bidirectional pipelines. In B. R. de Supinski, M. W. Hall, & T. Gamblin (Eds.), International Conference for High Performance Computing, Networking, Storage and Analysis, SC, St. Louis, Missouri, USA, November 14-19 (p. 27). ACM.
  94. Transconver: transformer and convolution parallel network for developing automatic brain tumor segmentation in mri images. Quantitative Imaging in Medicine and Surgery, 12, 2397.
  95. A survey of transformers. AI Open, 3, 111--132.
  96. Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Barcelona, Spain, May 4-8 (pp. 6419--6423). IEEE.
  97. Crt-net: A generalized and scalable framework for the computer-aided diagnosis of electrocardiogram signals. Appl. Soft Comput., 128, 109481.
  98. A transformer-based signal denoising network for aoa estimation in nlos environments. IEEE Commun. Lett., 26, 2336--2339.
  99. Unsupervised image-to-image translation networks. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, Long Beach, CA, USA (pp. 700--708).
  100. RoBERTa: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692. URL: http://arxiv.org/abs/1907.11692. arXiv:1907.11692.
  101. OS-MSL: one stage multimodal sequential link framework for scene segmentation and classification. In J. Magalhães, A. D. Bimbo, S. Satoh, N. Sebe, X. Alameda-Pineda, Q. Jin, V. Oria, & L. Toni (Eds.), MM: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14 (pp. 6269--6277). ACM.
  102. Swin transformer: Hierarchical vision transformer using shifted windows. In 2021 IEEE/CVF International Conference on Computer Vision ICCV, Montreal, QC, Canada, October 10-17 (pp. 9992--10002). IEEE.
  103. Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex. PLoS Comput. Biol., 15.
  104. Medical image segmentation using deep learning. Deep Learning in Healthcare: Paradigms and Applications, (pp. 17--31).
  105. A survey of image classification methods and techniques for improving classification performance. International journal of Remote sensing, 28, 823--870.
  106. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, NeurIPS, December 8-14, Vancouver, BC, Canada (pp. 13--23).
  107. Symmetric transformer-based network for unsupervised image registration. Knowl. Based Syst., 257, 109959.
  108. Deepjoint segmentation for the classification of severity-levels of glioma tumour using multimodal MRI images. IET Image Process., 14, 2541--2552.
  109. Mark A Musen, J. V. d. L. (1988). Of brittleness and bottlenecks: Challenges in the creation of pattern-recognition and expert-system models. In Machine Intelligence and Pattern Recognition,, 7, 335–352.
  110. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Medical Imaging, 34, 1993--2024.
  111. Recurrent neural network based language model. In T. Kobayashi, K. Hirose, & S. Nakamura (Eds.), INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010 (pp. 1045--1048). ISCA.
  112. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 44, 3523--3542.
  113. Monroe, D. (2017). Deep learning takes on translation. Commun. ACM, 60, 12--14.
  114. Murtagh, F. (1990). Multilayer perceptrons for classification and regression. Neurocomputing, 2, 183--197.
  115. Attention bottlenecks for multimodal fusion. In M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, & J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, NeurIPS, December 6-14, virtual (pp. 14200--14213).
  116. Speech recognition using deep neural networks: A systematic review. IEEE Access, 7, 19143--19165.
  117. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu, & S. Sabato (Eds.), International Conference on Machine Learning, ICML, 17-23 July, Baltimore, Maryland, USA (pp. 16784--16804). PMLR volume 162 of Proceedings of Machine Learning Research.
  118. A review on the attention mechanism of deep learning. Neurocomputing, 452, 48--62.
  119. Conditional image generation with pixelcnn decoders. In D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, December 5-10, Barcelona, Spain (pp. 4790--4798).
  120. An introduction to convolutional neural networks. CoRR, abs/1511.08458. URL: http://arxiv.org/abs/1511.08458. arXiv:1511.08458.
  121. Training language models to follow instructions with human feedback. CoRR, abs/2203.02155. URL: https://doi.org/10.48550/arXiv.2203.02155. doi:10.48550/arXiv.2203.02155. arXiv:2203.02155.
  122. Image-to-image translation: Methods and applications. IEEE Trans. Multim., 24, 3859--3881.
  123. Stabilizing transformers for reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, ICML, 13-18 July, Virtual Event (pp. 7487--7498). PMLR volume 119 of Proceedings of Machine Learning Research.
  124. Image transformer. In J. G. Dy, & A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, ICML, Stockholmsmässan, Stockholm, Sweden, July 10-15 (pp. 4052--4061). PMLR volume 80 of Proceedings of Machine Learning Research.
  125. Conformer: Local features coupling global representations for visual recognition. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021 (pp. 357--366). IEEE.
  126. Neural unification for logic reasoning over natural language. In M. Moens, X. Huang, L. Specia, & S. W. Yih (Eds.), Findings of the Association for Computational Linguistics: EMNLP, Virtual Event / Punta Cana, Dominican Republic, 16-20 November (pp. 3939--3950). Association for Computational Linguistics.
  127. Pnueli, A. (1977). The temporal logic of programs. In 18th Annual Symposium on Foundations of Computer Science, Providence, Rhode Island, USA, 31 October - 1 November (pp. 46--57). IEEE Computer Society.
  128. Generative language modeling for automated theorem proving. CoRR, abs/2009.03393. URL: https://arxiv.org/abs/2009.03393. arXiv:2009.03393.
  129. Prophetnet: Predicting future n-gram for sequence-to-sequence pre-training. In T. Cohn, Y. He, & Y. Liu (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November (pp. 2401--2410). Association for Computational Linguistics volume EMNLP 2020 of Findings of ACL.
  130. Learning transferable visual models from natural language supervision. In M. Meila, & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning, ICML, 18-24 July, Virtual Event (pp. 8748--8763). PMLR volume 139 of Proceedings of Machine Learning Research.
  131. Robust speech recognition via large-scale weak supervision. CoRR, abs/2212.04356. URL: https://doi.org/10.48550/arXiv.2212.04356. doi:10.48550/arXiv.2212.04356. arXiv:2212.04356.
  132. Improving language understanding with unsupervised learning. Technical Report OpenAI.
  133. Language models are unsupervised multitask learners. OpenAI blog, 1, 9.
  134. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21, 140:1--140:67.
  135. Zero-shot text-to-image generation. In M. Meila, & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning, ICML, 18-24 July, Virtual Event (pp. 8821--8831). PMLR volume 139 of Proceedings of Machine Learning Research.
  136. Building applied natural language generation systems. Nat. Lang. Eng., 3, 57--87.
  137. Transformer-enhanced periodic temporal convolution network for long short-term traffic flow forecasting. Expert Syst. Appl., 227, 120203.
  138. Sigt: An efficient end-to-end MIMO-OFDM receiver framework based on transformer. In 5th International Conference on Communications, Signal Processing, and their Applications, ICCSPA, Cairo, Egypt, December 27-29 (pp. 1--6). IEEE.
  139. A multi-head attention-based transformer model for traffic flow forecasting with a comparative analysis to recurrent neural networks. Expert Syst. Appl., 202, 117275.
  140. Pushing the limits of rule reasoning in transformers through natural language satisfiability. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI, Virtual Event, February 22 - March 1 (pp. 11209--11219). AAAI Press.
  141. Deep and reinforcement learning for automated task scheduling in large-scale cloud computing systems. Concurrency and Computation: Practice and Experience, 33, e5919.
  142. Deep smart scheduling: A deep learning approach for automated big data scheduling over the cloud. In 2019 7th International Conference on Future Internet of Things and Cloud (FiCloud) (pp. 189--196). IEEE.
  143. Trust-driven reinforcement selection strategy for federated learning on IoT devices. Computing, (pp. 1--23).
  144. Survey: Transformer based video-language pre-training. AI Open, 3, 1--13.
  145. Prover: Proof generation for interpretable reasoning over rules. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020 (pp. 122--136). Association for Computational Linguistics.
  146. Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108. URL: http://arxiv.org/abs/1910.01108. arXiv:1910.01108.
  147. Learning a SAT solver from single-bit supervision. In 7th International Conference on Learning Representations, ICLR, New Orleans, LA, USA, May 6-9. OpenReview.net.
  148. Video transformers: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, .
  149. Transformers in medical imaging: A survey. Medical Image Analysis, (p. 102802).
  150. State of charge estimation for lithium-ion battery using transformer with immersion and invariance adaptive observer. Journal of Energy Storage, 45, 103768.
  151. Dilated transformer: residual axial attention for breast ultrasound image segmentation. Quantitative Imaging in Medicine and Surgery, 12, 4512.
  152. Dual-domain sparse-view ct reconstruction with transformers. Physica Medica, 101, 1--7.
  153. Transformer-based machine learning for fast SAT solvers and logic synthesis. CoRR, abs/2107.07116. URL: https://arxiv.org/abs/2107.07116. arXiv:2107.07116.
  154. Satformer: Transformers for SAT solving. CoRR, abs/2209.00953. URL: https://doi.org/10.48550/arXiv.2209.00953. doi:10.48550/arXiv.2209.00953. arXiv:2209.00953.
  155. Where to look: Focus regions for visual question answering. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Las Vegas, NV, USA, June 27-30 (pp. 4613--4621). IEEE Computer Society.
  156. Perspectives and prospects on transformer architecture for cross-modal tasks with language and vision. Int. J. Comput. Vis., 130, 435--454.
  157. CLUTRR: A diagnostic benchmark for inductive reasoning from text. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, Hong Kong, China, November 3-7 (pp. 4505--4514). Association for Computational Linguistics.
  158. VL-BERT: pre-training of generic visual-linguistic representations. In 8th International Conference on Learning Representations, ICLR, Addis Ababa, Ethiopia, April 26-30. OpenReview.net.
  159. Attention is all you need in speech separation. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Toronto, ON, Canada, June 6-11 (pp. 21--25). IEEE.
  160. Ammu: A survey of transformer-based biomedical pretrained language models. arXiv e-prints, (pp. arXiv--2105).
  161. AMMUS : A survey of transformer-based pretrained models in natural language processing. CoRR, abs/2108.05542. URL: https://arxiv.org/abs/2108.05542. arXiv:2108.05542.
  162. Learning to optimize: Training deep neural networks for wireless resource management. In 18th IEEE International Workshop on Signal Processing Advances in Wireless Communications, SPAWC, Sapporo, Japan, July 3-6 (pp. 1--6). IEEE.
  163. Loftr: Detector-free local feature matching with transformers. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR virtual, June 19-25 (pp. 8922--8931). Computer Vision Foundation / IEEE.
  164. Hybridctrm: Bridging cnn and transformer for multimodal brain image segmentation. Journal of Healthcare Engineering, 2021.
  165. A survey of multimodal deep generative models. Adv. Robotics, 36, 261--278.
  166. Indoor-outdoor image classification. In 1998 International Workshop on Content-Based Access of Image and Video Databases, CAIVD 1998, Bombay, India, January 3, 1998 (pp. 42--51). IEEE Computer Society.
  167. LXMERT: learning cross-modality encoder representations from transformers. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, Hong Kong, China, November 3-7 (pp. 5099--5110). Association for Computational Linguistics.
  168. A survey automatic text summarization. PressAcademia Procedia, 5, 205--213.
  169. Efficient transformers: A survey. ACM Comput. Surv., 55, 109:1--109:28.
  170. Charformer: Fast character transformers via gradient-based subword tokenization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29. OpenReview.net.
  171. Training data-efficient image transformers & distillation through attention. In M. Meila, & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning, ICML, 18-24 July, Virtual Event (pp. 10347--10357). PMLR volume 139 of Proceedings of Machine Learning Research.
  172. Attention is all you need. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, December 4-9, Long Beach, CA, USA (pp. 5998--6008).
  173. Bertology meets biology: Interpreting attention in protein language models. In 9th International Conference on Learning Representations, ICLR, Virtual Event, Austria, May 3-7. OpenReview.net.
  174. Supervised speech separation based on deep learning: An overview. IEEE ACM Trans. Audio Speech Lang. Process., 26, 1702--1726.
  175. Survey on automatic text summarization and transformer models applicability. In CCRIS: International Conference on Control, Robotics and Intelligent System, Xiamen, China, October 27-29 (pp. 176--184). ACM.
  176. GIT: A generative image-to-text transformer for vision and language. CoRR, abs/2205.14100. URL: https://doi.org/10.48550/arXiv.2205.14100. doi:10.48550/arXiv.2205.14100. arXiv:2205.14100.
  177. Augmented convolutional neural networks with transformer for wireless interference identification. In IEEE Global Communications Conference, GLOBECOM, Madrid, Spain, December 7-11 (pp. 1--6). IEEE.
  178. Deep reinforcement learning with communication transformer for adaptive live streaming in wireless edge networks. IEEE Journal on Selected Areas in Communications, 40, 308--322.
  179. Tfnet: Transformer fusion network for ultrasound image segmentation. In C. Wallraven, Q. Liu, & H. Nagahara (Eds.), Pattern Recognition - 6th Asian Conference, ACPR, Jeju Island, South Korea, November 9-12, Revised Selected Papers, Part I (pp. 314--325). Springer volume 13188 of Lecture Notes in Computer Science.
  180. O-net: a novel framework with deep fusion of cnn and transformer for simultaneous segmentation and classification. Frontiers in Neuroscience, 16.
  181. Medical image classification using deep learning. Deep learning in healthcare: paradigms and applications, (pp. 33--51).
  182. R-transformer: Recurrent neural network enhanced transformer. CoRR, abs/1907.05572. URL: http://arxiv.org/abs/1907.05572. arXiv:1907.05572.
  183. Simvlm: Simple visual language model pretraining with weak supervision. In The Tenth International Conference on Learning Representations, ICLR, Virtual Event, April 25-29. OpenReview.net.
  184. Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans. Pattern Anal. Mach. Intell., 38, 1583--1597.
  185. Di-unet: Dimensional interaction self-attention for medical image segmentation. Biomed. Signal Process. Control., 78, 103896.
  186. Quan-transformer based channel feedback for ris-aided wireless communication systems. IEEE Commun. Lett., 26, 2631--2635.
  187. KM-BART: knowledge enhanced multimodal BART for visual commonsense generation. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP, (Volume 1: Long Papers), Virtual Event, August 1-6 (pp. 525--535). Association for Computational Linguistics.
  188. Transformers in computational visual media: A survey. Computational Visual Media, 8, 33--62.
  189. Actor-critic with transformer for cloud computing resource three stage job scheduling. In 7th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, 22-24 April (pp. 33--37).
  190. Seizure prediction based on transformer using scalp electroencephalogram. Applied Sciences, 12, 4158.
  191. Swin transformer-based GAN for multi-modal medical image translation. Frontiers in Oncology, 12.
  192. Videogpt: Video generation using VQ-VAE and transformers. CoRR, abs/2104.10157. URL: https://arxiv.org/abs/2104.10157. arXiv:2104.10157.
  193. Cswin-pnet: A cnn-swin transformer combined pyramid network for breast lesion segmentation in ultrasound images. Expert Syst. Appl., Volume 213, Part B, 119024.
  194. Automated diagnosis of atrial fibrillation using ECG component-aware transformer. Comput. Biol. Medicine, 150, 106115.
  195. Transformer-transducer: End-to-end speech recognition with self-attention. CoRR, abs/1910.12977. URL: http://arxiv.org/abs/1910.12977. arXiv:1910.12977.
  196. Automatic speech recognition volume 1. Springer.
  197. Multimodal transformer with multi-view visual representation for image captioning. IEEE Trans. Circuits Syst. Video Technol., 30, 4467--4480.
  198. Computation offloading for mobile edge computing: A deep learning approach. In 28th IEEE Annual International Symposium on Personal, Indoor, and Mobile Radio Communications, PIMRC, Montreal, QC, Canada, October 8-13 (pp. 1--6). IEEE.
  199. Florence: A new foundation model for computer vision. CoRR, abs/2111.11432. URL: https://arxiv.org/abs/2111.11432. arXiv:2111.11432.
  200. Graph transformer networks. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, NeurIPS, December 8-14, Vancouver, BC, Canada (pp. 11960--11970).
  201. From recognition to cognition: Visual commonsense reasoning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Long Beach, CA, USA, June 16-20 (pp. 6720--6731). Computer Vision Foundation / IEEE.
  202. Deep learning in mobile and wireless networking: A survey. IEEE Commun. Surv. Tutorials, 21, 2224--2287.
  203. PEGASUS: pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July, Virtual Event (pp. 11328--11339). PMLR volume 119 of Proceedings of Machine Learning Research.
  204. Transformer transducer: A streamable speech recognition model with transformer encoders and RNN-T loss. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Barcelona, Spain, May 4-8 (pp. 7829--7833). IEEE.
  205. Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. IEEE J. Sel. Top. Signal Process., 16, 1519--1532.
  206. Object detection with deep learning: A review. IEEE Trans. Neural Networks Learn. Syst., 30, 3212--3232.
  207. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, virtual, June 19-25 (pp. 6881--6890). Computer Vision Foundation / IEEE.
  208. Improving end-to-end speech synthesis with local recurrent neural network enhanced transformer. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 (pp. 6734--6738). IEEE.
  209. nnformer: Interleaved transformer for volumetric segmentation. CoRR, abs/2109.03201. URL: https://arxiv.org/abs/2109.03201. arXiv:2109.03201.
  210. iBOT: Image BERT pre-training with online tokenizer. CoRR, abs/2111.07832. URL: https://arxiv.org/abs/2111.07832. arXiv:2111.07832.
  211. An accurate ensemble forecasting approach for highly dynamic cloud workload with VMD and r-transformer. IEEE Access, 8, 115992--116003.
  212. Region aware transformer for automatic breast ultrasound tumor segmentation. In E. Konukoglu, B. H. Menze, A. Venkataraman, C. F. Baumgartner, Q. Dou, & S. Albarqouni (Eds.), International Conference on Medical Imaging with Deep Learning, MIDL, 6-8 July, Zurich, Switzerland (pp. 1523--1537). PMLR volume 172 of Proceedings of Machine Learning Research.
  213. Swincup: Cascaded swin transformer for histopathological structures segmentation in colorectal cancer. Expert Syst. Appl., 216, 119452.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Saidul Islam (7 papers)
  2. Hanae Elmekki (6 papers)
  3. Ahmed Elsebai (1 paper)
  4. Jamal Bentahar (23 papers)
  5. Najat Drawel (1 paper)
  6. Gaith Rjoub (6 papers)
  7. Witold Pedrycz (67 papers)
Citations (91)
Youtube Logo Streamline Icon: https://streamlinehq.com