Doubly Aligned Incomplete Multi-view Clustering (1903.02785v1)

Published 7 Mar 2019 in cs.LG and stat.ML

Abstract: Nowadays, multi-view clustering has attracted more and more attention. To date, almost all the previous studies assume that views are complete. However, in reality, it is often the case that each view may contain some missing instances. Such incompleteness makes it impossible to directly use traditional multi-view clustering methods. In this paper, we propose a Doubly Aligned Incomplete Multi-view Clustering algorithm (DAIMC) based on weighted semi-nonnegative matrix factorization (semi-NMF). Specifically, on the one hand, DAIMC utilizes the given instance alignment information to learn a common latent feature matrix for all the views. On the other hand, DAIMC establishes a consensus basis matrix with the help of $L_{2,1}$-Norm regularized regression for reducing the influence of missing instances. Consequently, compared with existing methods, besides inheriting the strength of semi-NMF with ability to handle negative entries, DAIMC has two unique advantages: 1) solving the incomplete view problem by introducing a respective weight matrix for each view, making it able to easily adapt to the case with more than two views; 2) reducing the influence of view incompleteness on clustering by enforcing the basis matrices of individual views being aligned with the help of regression. Experiments on four real-world datasets demonstrate its advantages.

Citations (228)

View on Semantic Scholar

Summary

The paper proposes a novel dual-aligned framework that leverages weighted semi-NMF to effectively handle incomplete multi-view data.
It employs L2,1-norm regularized regression to align basis matrices, ensuring cohesive latent feature representations despite missing views.
Experimental results show enhanced clustering accuracy and normalized mutual information across diverse datasets compared to state-of-the-art methods.

An In-Depth Analysis of Doubly Aligned Incomplete Multi-view Clustering

The paper, "Doubly Aligned Incomplete Multi-view Clustering" (DAIMC), presents a novel approach to handling the incomplete multi-view clustering problem. The primary challenge being addressed is the presence of missing data across multiple views, which is pervasive in real-world applications, e.g., instances where views from different sensors or translations are missing or incomplete.

Key Contributions

DAIMC tackles this issue by integrating semi-nonnegative matrix factorization (semi-NMF) and $L_{2,1}$ -Norm regularized regression, yielding a dual-aligned structure. Existing methods predominantly assume complete views, which limits their applicability. The contributions of this work can be encapsulated as follows:

Weighted Semi-NMF for Incompleteness: The DAIMC method introduces a differentiated treatment of incomplete multi-view data via a weighted semi-NMF framework. Each incomplete view is assigned a respective weight matrix that distinguishes between missing and present instances. This strategy is robust to situations involving more than two views, surpassing previous methods that rely on averaging or simplistic imputations of missing data.
Alignment via $L_{2,1}$ -Norm Regularized Regression: Beyond instance alignment, the basis matrices of different views are aligned using a regression model with $L_{2,1}$ regularization. This reduces the influence of missing data and provides a cohesive latent feature representation that accommodates global view information.
Iterative Optimization with Convergence Assurance: The proposed DAIMC employs an iterative optimization process that guarantees convergence to a locally optimal solution. This is achieved through alternating updates of the basis matrices, common latent features, and the regression coefficients.

Experimental Insights

The robustness of DAIMC is corroborated by experiments on multiple real-world datasets, covering both image and text domains. The paper demonstrates the application and superiority of DAIMC compared to state-of-the-art methods such as PVC, IMG, and MIC. On datasets with varying degrees of view incompleteness, DAIMC consistently showed enhanced clustering accuracy and normalized mutual information.

Specifically, DAIMC exhibited remarkable improvements on the Digit dataset, attaining significant performance gains despite a high missing data rate. This achievement is attributed to the model's capability to deftly handle multiple incomplete views simultaneously, a scenario where many traditional methods falter.

Implications and Future Work

Theoretical implications of this research include its potential to refine existing models for multi-view clustering by introducing structural alignments. Practically, it equips practitioners with a tool adept at handling imperfect real-world data across varying domains, improving clustering outcomes without requiring complete datasets.

Future research directions suggested involve scaling the method to handle large datasets more efficiently, potentially through the application of online learning or incremental learning strategies. These extensions would broaden the applicability of DAIMC across more extensive, real-time data environments, thereby reinforcing its utility in big data analytics.

In conclusion, DAIMC presents a sophisticated and well-rounded approach to a persistent problem in multi-view learning, contributing both a versatile theoretical model and practical advancements in clustering techniques. Engaging with this paper offers insight into both foundational improvements and innovative applications in the field of incomplete multi-view data clustering.

PDF Markdown