A Survey of Multi-View Representation Learning (1610.01206v5)

Published 3 Oct 2016 in cs.LG, cs.CV, and cs.IR

Abstract: Recently, multi-view representation learning has become a rapidly growing direction in machine learning and data mining areas. This paper introduces two categories for multi-view representation learning: multi-view representation alignment and multi-view representation fusion. Consequently, we first review the representative methods and theories of multi-view representation learning based on the perspective of alignment, such as correlation-based alignment. Representative examples are canonical correlation analysis (CCA) and its several extensions. Then from the perspective of representation fusion we investigate the advancement of multi-view representation learning that ranges from generative methods including multi-modal topic learning, multi-view sparse coding, and multi-view latent space Markov networks, to neural network-based methods including multi-modal autoencoders, multi-view convolutional neural networks, and multi-modal recurrent neural networks. Further, we also investigate several important applications of multi-view representation learning. Overall, this survey aims to provide an insightful overview of theoretical foundation and state-of-the-art developments in the field of multi-view representation learning and to help researchers find the most appropriate tools for particular applications.

Citations (472)

View on Semantic Scholar

Summary

The paper presents an extensive comparison of alignment methods (e.g., CCA, Deep CCA) and fusion techniques (e.g., CNNs, RNNs) for integrating multi-view data.
The paper demonstrates practical applications in cross-media retrieval and NLP, showing how multi-view models boost semantic consistency and accuracy.
The paper outlines future research directions aimed at enhancing scalability, robustness, and combining methodologies for richer data representations.

A Survey of Multi-View Representation Learning

The paper "A Survey of Multi-View Representation Learning" provides an extensive overview of the methodologies and theories underlying multi-view representation learning (MVRL). This area has garnered significant attention due to the increasing availability of multi-modal data across varied applications. The survey categorizes the approaches into two main types: multi-view representation alignment and multi-view representation fusion.

Multi-View Representation Alignment

This category focuses on aligning features from multiple views to uncover relationships. Alignment methods are further divided into:

Correlation-based Alignment: Canonical Correlation Analysis (CCA) and its variants are explored due to their capacity to maximize correlation between different data views. Kernel CCA enhances this by incorporating nonlinear data relationships, while Deep CCA leverages deep neural networks to learn complex correlations effectively.
Distance and Similarity-based Alignment: Techniques like Partial Least Squares (PLS) and Cross-Modal Ranking involve aligning features based on distance or similarity. These methods prove effective in maintaining intra-view and inter-view consistencies, critical in tasks such as cross-modal retrieval.

Multi-View Representation Fusion

Fusion approaches aim at integrating multiple data views into a single coherent representation. This category includes:

Graphical Model-based Fusion: Techniques such as multi-modal Latent Dirichlet Allocation (LDA) and multi-view sparse coding provide probabilistic frameworks to model shared latent spaces across views.
Neural Network-based Fusion: These methods capitalize on the strengths of neural networks. Multi-view convolutional neural networks (CNNs) and multi-modal recurrent neural networks (RNNs) illustrate how deep networks can effectively combine representations from varied views, proving useful in applications like image captioning and person re-identification.

Practical Applications and Implications

MVRL has significant applicability in cross-media retrieval, natural language processing, video analysis, and recommender systems. For instance, in cross-media retrieval, deep multi-view representation learning models enhance the retrieval accuracy by utilizing both text and image data comprehensively. In NLP, these methods have been applied to enhance semantic understanding and improve translation systems by fusing linguistic and visual information.

Future Directions

The survey implies that future research could focus on improving the scalability of these methods, enhancing the robustness of representations against noise, and better modeling the interdependencies between different data views. There's also a potential for exploring hybrid architectures that combine different methodologies for more effective representation learning.

Conclusion

This paper provides a thorough examination of multi-view representation learning, detailing both the foundational theories and cutting-edge applications. It serves as a critical resource for researchers aiming to apply these methods in real-world scenarios, offering insights into both conventional and novel approaches in exploiting multiple data views. The survey underscores the transformative potential of MVRL in advancing machine learning applications by leveraging enriched data representations.

PDF Markdown