Papers
Topics
Authors
Recent
2000 character limit reached

JUICE: Unsupervised Multiview Selection

Updated 20 December 2025
  • The paper introduces a unified framework that jointly performs unsupervised feature selection, instance co-selection, and cross-view imputation to enhance data representativeness.
  • It employs a block-coordinate descent optimization with convex subproblems and adaptive view weighting to efficiently solve for missing data recovery and selection.
  • Empirical evaluations on eight benchmarks demonstrate significant ACC and F1 improvements, confirming the robustness and effectiveness of the proposed approach.

Joint Learning of Unsupervised MultI-view Feature and Instance Co-selection with Cross-view Imputation (JUICE) is a framework that unifies unsupervised feature selection, instance co-selection, and data imputation for incomplete multi-view datasets. It addresses the limitations of treating these tasks as independent stages, capturing both intra- and cross-view relationships, and exploiting synergistic interactions between selection and imputation to improve the representativeness of selected features and instances.

1. Problem Formulation and Objective

Consider a collection of VV data "views" (modalities), each with a potentially different set of features and incomplete observations: X={X(v)Rdv×n}v=1V\mathcal{X} = \{\mathbf{X}^{(v)} \in \mathbb{R}^{d_v \times n}\}_{v=1}^V. For each view vv, only nvnn_v \leq n samples are observed. Missingness is tracked by the indicator M{0,1}n×V\mathbf{M} \in \{0,1\}^{n \times V}, where Miv=1M_{iv} = 1 if instance ii is present in view vv. The binary selection matrix G(v){0,1}n×nv\mathbf{G}^{(v)} \in \{0,1\}^{n \times n_v} specifies the available data in view vv as O(v)=X(v)G(v)\mathbf{O}^{(v)} = \mathbf{X}^{(v)} \mathbf{G}^{(v)}.

The optimization variables are:

  • Feature selection: W(v)Rdv×c\mathbf{W}^{(v)} \in \mathbb{R}^{d_v \times c} (orthonormal columns, cdvc \ll d_v) with 2,1\ell_{2,1}‐row-sparsity to identify informative features.
  • Instance selection: Q(v)Rn×nv\mathbf{Q}^{(v)} \in \mathbb{R}^{n \times n_v} (nonnegative, column sum = 1), representing the importance of each observed instance, with row-sparsity induced by Frobenius penalty.
  • Reconstructed data: H(v)Rdv×n\mathbf{H}^{(v)} \in \mathbb{R}^{d_v \times n}, a completion over all nn instances.
  • Adaptive imputed matrix: Xˉ(v)\bar{\mathbf{X}}^{(v)}, refined by cross-view neighborhood graphs.
  • View weights: γ=(γ(1),...,γ(V))\boldsymbol{\gamma} = (\gamma^{(1)}, ..., \gamma^{(V)})^\top.

The JUICE objective function is: min{W(v),Q(v),H(v),Xˉ(v),γ(v)}v=1Vv=1V1γ(v)[W(v)T(H(v)O(v)Q(v)T)F2+λW(v)2,1+αQ(v)F2+βH(v)G(v)O(v)F2+i,jQij(v)Xˉi(v)Oj(v)22]\min_{\substack{\{\mathbf{W}^{(v)},\mathbf{Q}^{(v)},\mathbf{H}^{(v)},\bar{\mathbf X}^{(v)},\gamma^{(v)}\}_{v=1}^V}} \sum_{v=1}^V \frac{1}{\gamma^{(v)}} \Bigl[ \|\mathbf{W}^{(v)T}(\mathbf{H}^{(v)} - \mathbf{O}^{(v)} \mathbf{Q}^{(v)T})\|_F^2 + \lambda \|\mathbf{W}^{(v)}\|_{2,1} + \alpha \|\mathbf{Q}^{(v)}\|_F^2 + \beta \|\mathbf{H}^{(v)}\mathbf{G}^{(v)} - \mathbf{O}^{(v)}\|_F^2 + \sum_{i, j} Q_{ij}^{(v)} \|\bar{\mathbf X}^{(v)}_{\cdot i} - \mathbf{O}^{(v)}_{\cdot j}\|_2^2 \Bigr] subject to: W(v)TW(v)=I, Q(v)1=1, Q(v)0,γ1=1,γ(v)0\mathbf{W}^{(v)T}\mathbf{W}^{(v)} = \mathbf{I},\ \mathbf{Q}^{(v)} \mathbf{1} = \mathbf{1},\ \mathbf{Q}^{(v)} \geq 0, \boldsymbol{\gamma}^\top\mathbf{1} = 1, \gamma^{(v)} \geq 0 and index constraints ensuring correct mapping of missing and observed entries.

This objective integrates projective reconstruction, feature selection, instance selection, imputation fidelity (alignment with observed entries), and cross-view regularization, all in a single optimization.

2. Cross-view Neighborhood Imputation

Missing feature values are imputed not independently but with adaptive cross-view fusion. For each view vv, a kk-NN similarity graph S(v)Rn×n\mathbf{S}^{(v)} \in \mathbb{R}^{n \times n} is constructed. The imputed value of a missing entry is the weighted aggregate of reconstructed data from all views, propagated via these graphs. For data instance ii in view vv:

Xˉi(v)=MivXi(v)+(1Miv)u=1V(H(u)S(u)T)iu=1Vj=1nSij(u)\bar{\mathbf X}^{(v)}_{\cdot i} = M_{iv} \mathbf X^{(v)}_{\cdot i} + (1 - M_{iv}) \frac{\sum_{u=1}^V (\mathbf H^{(u)} \mathbf S^{(u)T})_{\cdot i}}{\sum_{u=1}^V \sum_{j=1}^n S^{(u)}_{ij}}

In matrix form: Xˉ(v)=M(v)X(v)+(1M(v))(u=1VH(u)S(u)T)/(u=1VS(u)1)\bar{\mathbf X}^{(v)} = \mathbf M^{(v)} \odot \mathbf X^{(v)} + (1-\mathbf M^{(v)}) \odot \left( \sum_{u=1}^V \mathbf H^{(u)} \mathbf S^{(u)T} \right) / \left( \sum_{u=1}^V \mathbf S^{(u)} \mathbf 1 \right) where \odot denotes elementwise multiplication. This mechanism ensures imputation is synergistic across views, exploiting the geometric structure of both observed and reconstructed data.

3. Unified Feature and Instance Co-selection

Feature co-selection is achieved via the 2,1\ell_{2,1} penalty on W(v)\mathbf{W}^{(v)}, enforcing row-sparsity which retains only the most informative features for each view. Instance co-selection is modeled through the Frobenius penalty and simplex constraints on Q(v)\mathbf{Q}^{(v)}, promoting sparsity across its columns; only a handful of instances are assigned significant selection weights. These selection pressures act jointly, as both H(v)\mathbf{H}^{(v)} and Xˉ(v)\bar{\mathbf{X}}^{(v)} are shared across these terms. Consequently, feature selection, instance selection, and imputation reinforce each other's effectiveness during optimization.

4. Optimization Algorithm and Convergence

The optimization problem is managed via block-coordinate descent, iteratively updating:

  • H(v)\mathbf{H}^{(v)}: Solved via generalized Sylvester equations,

A(v)H(v)+H(v)B(v)=C(v)\mathbf{A}^{(v)}\mathbf{H}^{(v)} + \mathbf{H}^{(v)}\mathbf{B}^{(v)} = \mathbf{C}^{(v)}

with A(v)\mathbf{A}^{(v)}, B(v)\mathbf{B}^{(v)}, C(v)\mathbf{C}^{(v)} dependent on current W(v)\mathbf{W}^{(v)}, similarity matrices, and observed data; solved with BiCG iterative solvers.

  • Q(v)\mathbf{Q}^{(v)}: Each row by a convex quadratic program (simplex-projected); closed-form via KKT conditions.
  • W(v)\mathbf{W}^{(v)}: Minimization of trace form with orthonormality constraint; solved via eigendecomposition:

minW(v)TW(v)=ITr[W(v)T(P(v)P(v)T+λD(v))W(v)]\min_{\mathbf{W}^{(v)T}\mathbf{W}^{(v)} = \mathbf{I}} \mathrm{Tr}[\mathbf{W}^{(v)T}(\mathbf{P}^{(v)}\mathbf{P}^{(v)T} + \lambda\mathbf{D}^{(v)})\mathbf{W}^{(v)}]

with cc smallest eigenvectors, and adaptive weights D(v)\mathbf{D}^{(v)}.

  • Xˉ(v)\bar{\mathbf X}^{(v)}: Updated using the cross-view neighborhood fusion formula.
  • γ(v)\gamma^{(v)}: Updated in closed form to minimize the overall objective, with

γ(v)=ϕ(v)u=1Vϕ(u)\gamma^{(v)} = \frac{\sqrt{\phi^{(v)}}}{\sum_{u=1}^V \sqrt{\phi^{(u)}}}

where ϕ(v)\phi^{(v)} is the current view’s in-objective loss.

Empirical convergence is achieved in 10\approx 10–$15$ outer iterations. Each subproblem is convex when the others are fixed, ensuring monotonic decrease of the objective. The per-iteration complexity—aggregated over all views—is

O(v=1V[l(ndv+z)+nnv+dv2c+nkdv])\mathcal{O}\left(\sum_{v=1}^V \big[ l(n d_v + z) + n n_v + d_v^2 c + n k d_v \big] \right)

where ll is BiCG iterations, zz is the number of nonzeros in the Kronecker system, and kk is the kk-NN size (Cai et al., 17 Dec 2025).

5. Empirical Evaluation

JUICE has been evaluated on eight real-world multi-view benchmarks (Yale, MSRC-V1, COIL20, HandWritten, BDGP, CCV, USPS, ALOI) with missing rates from 10%10\% to 50%50\%. Each benchmark contains multiple views, hundreds to thousands of samples, and varying feature dimensions and class counts. Instance and feature selection ratios were varied (10%10\%50%50\%). ACC and F1 scores were measured across $30$ random runs.

Key results include:

  • On MSRC-V1 at 40%40\% missingness and 20%20\% selection rates, JUICE attains ACC of 68%\approx 68\%, compared to 53%\approx 53\% for the nearest competitor.
  • Across datasets and varying missingness/selection rates, JUICE consistently yields 5%5\%15%15\% higher ACC/F1 than single-view (UFI, DFIS, sCOs2) and combination-based methods (C2IN, TERN, TIMC, UKMC, UIMD, SCMD).
  • Robustness is maintained under high missingness; t-SNE plots demonstrate superior cluster separation for selected samples.
  • Ablation confirms that both cross-view fusion and adaptive imputation are necessary: removing either component (JUICE–I or JUICE–II) reduces ACC/F1 by $5$–10%10\%.
  • Hyperparameter sensitivity is moderate; stable performance for λ[103,101]\lambda \in [10^{-3}, 10^{-1}], α,β[1,102]\alpha, \beta \in [1, 10^2] (Cai et al., 17 Dec 2025).

6. Theoretical and Methodological Insights

The JUICE framework is characterized by several key advances:

  • First unified framework for simultaneous unsupervised multi-view feature and instance co-selection with joint missing data recovery.
  • Cross-view neighborhood fusion: By propagating imputation through fused kk-NN graphs, JUICE exploits complementary information and improves robustness to missingness.
  • Adaptive view weighting: The optimization dynamically shifts importance to the most informative and best-reconstructed views.

Each subproblem’s convexity underpins stable convergence, while coupling selection and imputation yields improved representation of both feature and instance structure. However, the reliance on iterative Sylvester and eigenvalue solvers poses challenges for extremely high-dimensional settings; parameter selection remains manual; scaling to ultra-large nn or dvd_v would require stochastic or parallel extensions.

7. Applications, Limitations, and Future Directions

JUICE is applicable to any multimodal dataset with partial observations, including:

  • Environmental sensor fusion (e.g., missing sensor readings)
  • Multimedia retrieval (audio–video data streams)
  • Bioinformatics (multi-omics with incomplete assays)
  • Clinical analytics (multi-modality medical records)

Limitations include computational expense for very high-dimensional or extremely large datasets and a need for further automation of hyperparameter selection. A plausible implication is that stochastic or deep learning-based instantiations could further improve scalability and automaticity in large-scale settings. Future research may target these avenues (Cai et al., 17 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Joint Learning of Unsupervised MultI-view Feature and Instance Co-selection with Cross-view Imputation (JUICE).