A Survey on Multi-View Clustering (1712.06246v2)

Published 18 Dec 2017 in cs.LG and stat.ML

Abstract: With advances in information acquisition technologies, multi-view data become ubiquitous. Multi-view learning has thus become more and more popular in machine learning and data mining fields. Multi-view unsupervised or semi-supervised learning, such as co-training, co-regularization has gained considerable attention. Although recently, multi-view clustering (MVC) methods have been developed rapidly, there has not been a survey to summarize and analyze the current progress. Therefore, this paper reviews the common strategies for combining multiple views of data and based on this summary we propose a novel taxonomy of the MVC approaches. We further discuss the relationships between MVC and multi-view representation, ensemble clustering, multi-task clustering, multi-view supervised and semi-supervised learning. Several representative real-world applications are elaborated. To promote future development of MVC, we envision several open problems that may require further investigation and thorough examination.

Authors (3)

Guoqing Chao (15 papers)
Shiliang Sun (25 papers)
Jinbo Bi (28 papers)

Citations (196)

View on Semantic Scholar

Summary

The paper presents a comprehensive taxonomy of multi-view clustering methods, categorizing them into generative and discriminative approaches.
It systematically reviews methodologies that enhance clustering performance across multiple data views in fields like computer vision and bioinformatics.
It identifies key future directions, including scalability, robust handling of missing data, and integration with deep learning techniques.

An Expert Analysis of "A Survey on Multi-View Clustering"

The paper "A Survey on Multi-View Clustering" by Guoqing Chao, Shiliang Sun, and Jinbo Bi, presents a comprehensive survey of the landscape of multi-view clustering (MVC), an area of growing interest within the fields of machine learning and data mining. It systematically reviews existing methods, proposes a novel taxonomy for categorizing these approaches, and discusses the positioning of MVC relative to related paradigms such as multi-view representation, ensemble clustering, and multi-task clustering.

Overview of Multi-View Clustering

Multi-view clustering involves grouping data points into coherent clusters using multiple sources of information or "views," each representing different feature sets. Multi-view data is increasingly prevalent due to advances in data acquisition technologies, necessitating methods that leverage all available information effectively. Traditional single-view clustering techniques, such as k-means or hierarchical clustering, often fall short when multiple sources are present since they do not exploit the potential consensus information across views.

Taxonomy of Multi-View Clustering Approaches

The paper categorizes existing MVC approaches into two major types—generative and discriminative methods—based on their underlying assumptions and methodologies:

Generative Approaches: These methods assume a probabilistic model for data generation and typically utilize mixture models or the EM algorithm. These are advantageous for their capability to naturally handle missing data and promise global optimal solutions under convex settings.
Discriminative Approaches: These are more prevalent and diversified, categorized further based on how they integrate information from multiple views:
- Common Eigenvector Matrix: Primarily related to multi-view spectral clustering, these methods enforce a shared or similar eigenspace across views.
- Common Coefficient Matrix: Often used in multi-view subspace clustering, these methods aim for a common subspace representation.
- Common Indicator Matrix: These methods involve non-negative matrix factorization and k-means extensions, ensuring unified cluster assignments across views.
- Direct Combination: Typically entails multi-kernel learning and combines views directly, often in a unified kernel space.
- Combination After Projection: Methods like canonical correlation analysis (CCA) are used to project features before combining them, especially useful when views differ significantly in data type.

Implications and Future Directions

Multi-view clustering holds substantial implications for practical applications across various domains, such as computer vision, natural language processing, and bioinformatics. The paper provides a rich discussion of these applications, illustrating MVC's capacity to enhance clustering performance by leveraging multi-view data.

The survey also identifies several open problems and research directions:

Scalability: Addressing the computational challenges in processing large-scale datasets efficiently.
Handling Incomplete Data: Developing robust MVC methods that can handle missing views or data points more effectively.
Deep Learning: Exploring deep multi-view clustering approaches that integrate representation learning and clustering into a cohesive framework.
Multiple Solutions: Investigating methodologies that can capture multiple valid clustering solutions concurrently, reflecting different grouping perspectives.

Conclusion

The paper by Chao et al. aggregates a wealth of research into an organized framework and highlights significant gaps that set the stage for future explorations in multi-view clustering. By providing clear categorizations and articulating potential advancements, this survey serves as a valuable resource for researchers seeking to explore MVC methodologies and their applications. The implications for artificial intelligence and data-driven fields are vast, offering numerous pathways for innovation and discovery.