Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Survey: Transformer-based Models in Data Modality Conversion (2408.04723v1)

Published 8 Aug 2024 in eess.IV, cs.AI, cs.CL, and eess.SP

Abstract: Transformers have made significant strides across various artificial intelligence domains, including natural language processing, computer vision, and audio processing. This success has naturally garnered considerable interest from both academic and industry researchers. Consequently, numerous Transformer variants (often referred to as X-formers) have been developed for these fields. However, a thorough and systematic review of these modality-specific conversions remains lacking. Modality Conversion involves the transformation of data from one form of representation to another, mimicking the way humans integrate and interpret sensory information. This paper provides a comprehensive review of transformer-based models applied to the primary modalities of text, vision, and speech, discussing their architectures, conversion methodologies, and applications. By synthesizing the literature on modality conversion, this survey aims to underline the versatility and scalability of transformers in advancing AI-driven content generation and understanding.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Elyas Rashno (8 papers)
  2. Amir Eskandari (3 papers)
  3. Aman Anand (3 papers)
  4. Farhana Zulkernine (23 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com