Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion (2006.08159v1)

Published 15 Jun 2020 in cs.CV

Abstract: With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. Such fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-art focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can essentially deepen the fusion from multi-modal deep feature spaces. In this paper, we provide a substantial overview of the existing state-of-the-arts on the filed of multi-modal data analytics from shallow to deep spaces. Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition and fusion over multi-modal spaces. Finally, we share our viewpoints regarding some future directions on this field.

PDF Abstract

Deep Multi-modal Data Analytics: Insights and Implications

The paper "Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion" by Yang Wang offers a comprehensive survey on multi-modal data analytics, focusing on the intersection of multi-modal data and deep learning methodologies. This survey provides an in-depth analysis of the current state of research in the fusion of different modalities to enhance data characterization, presenting both theoretical frameworks and practical applications.

Overview of Deep Multi-modal Techniques

Multi-modal data, characterized by different types of data representations or modalities, is central to high-dimensional data analytics. The fusion of these modalities aims to leverage complementary information, addressing challenges that arise with single-modal approaches. This paper underscores the significance of deep neural networks in capturing non-linear distributions inherent in multi-modal datasets. Particularly, it emphasizes collaboration and adversarial competition as key components for enhancing data fusion processes.

Deep Learning Approaches

The paper details the transition from shallow to deep learning architectures in multi-modal data analytics. Core focus areas include deep multi-modal methods for clustering and classification. The paper highlights various deep learning architectures that leverage complex data relationships and enhance feature representations through models like Latent Multi-view Subspace Clustering (LMSC) and Multi-view Spectral Clustering Network (MvSCN). For classification, methods such as Multi-view Metric Learning (MvML) and deep networks like the MvDN are discussed, showcasing their enhanced performance in diverse applications like object and face recognition.

Applications and Impact

Multi-view learning methodologies are applied extensively in multimedia analytics, including image retrieval and representation. Significant emphasis is placed on deep networks' capabilities to generate discriminative features for high-performance outcomes. Models like Deep Multi-modal Hashing and multi-view generative networks reveal profound implications for real-world applications in 3D object recognition and retrieval.

The discussion extends to Generative Adversarial Networks (GANs), showcasing their application in unsupervised and semi-supervised learning paradigms. GANs are highlighted for their pivotal role in generating and enhancing realistic data samples, further enriching multi-modal data collaborations through adversarial processes.

Progress and Challenges

While significant advancements have been made, the paper identifies ongoing challenges, particularly in the integration and reinforcement of multi-modal learning frameworks. A proposed future direction is enhancing the collaboration between modalities to address specific challenges intrinsic to large-scale and complex datasets, emphasizing spatial-temporal dynamics.

Speculative Future Developments

The exploration of spatial-temporal multi-modal collaboration represents a forward-looking dimension in this field. Potential developments include crafting robust network architectures that dynamically adjust to changing dataset characteristics, aiming for optimal collaboration timing between modalities. Such research is essential for advancing the utility and efficacy of multi-modal applications in varied domains, from autonomous systems to multimedia information retrieval.

This survey serves as a valuable resource for researchers seeking to explore the confluence of multi-modal data analytics and deep learning, presenting a thorough landscape of existing methodologies, practical applications, and potential research avenues. Its comprehensive coverage offers foundational insights that could shape future exploration in the ongoing evolution of multi-modal data analytics.

PDF Markdown Bookmark Chat (Pro)

Authors (1)

Yang Wang (672 papers)

Citations (177)

View on Semantic Scholar