JUICE: Unsupervised Multiview Selection
- The paper introduces a unified framework that jointly performs unsupervised feature selection, instance co-selection, and cross-view imputation to enhance data representativeness.
- It employs a block-coordinate descent optimization with convex subproblems and adaptive view weighting to efficiently solve for missing data recovery and selection.
- Empirical evaluations on eight benchmarks demonstrate significant ACC and F1 improvements, confirming the robustness and effectiveness of the proposed approach.
Joint Learning of Unsupervised MultI-view Feature and Instance Co-selection with Cross-view Imputation (JUICE) is a framework that unifies unsupervised feature selection, instance co-selection, and data imputation for incomplete multi-view datasets. It addresses the limitations of treating these tasks as independent stages, capturing both intra- and cross-view relationships, and exploiting synergistic interactions between selection and imputation to improve the representativeness of selected features and instances.
1. Problem Formulation and Objective
Consider a collection of data "views" (modalities), each with a potentially different set of features and incomplete observations: . For each view , only samples are observed. Missingness is tracked by the indicator , where if instance is present in view . The binary selection matrix specifies the available data in view as .
The optimization variables are:
- Feature selection: (orthonormal columns, ) with ‐row-sparsity to identify informative features.
- Instance selection: (nonnegative, column sum = 1), representing the importance of each observed instance, with row-sparsity induced by Frobenius penalty.
- Reconstructed data: , a completion over all instances.
- Adaptive imputed matrix: , refined by cross-view neighborhood graphs.
- View weights: .
The JUICE objective function is: subject to: and index constraints ensuring correct mapping of missing and observed entries.
This objective integrates projective reconstruction, feature selection, instance selection, imputation fidelity (alignment with observed entries), and cross-view regularization, all in a single optimization.
2. Cross-view Neighborhood Imputation
Missing feature values are imputed not independently but with adaptive cross-view fusion. For each view , a -NN similarity graph is constructed. The imputed value of a missing entry is the weighted aggregate of reconstructed data from all views, propagated via these graphs. For data instance in view :
In matrix form: where denotes elementwise multiplication. This mechanism ensures imputation is synergistic across views, exploiting the geometric structure of both observed and reconstructed data.
3. Unified Feature and Instance Co-selection
Feature co-selection is achieved via the penalty on , enforcing row-sparsity which retains only the most informative features for each view. Instance co-selection is modeled through the Frobenius penalty and simplex constraints on , promoting sparsity across its columns; only a handful of instances are assigned significant selection weights. These selection pressures act jointly, as both and are shared across these terms. Consequently, feature selection, instance selection, and imputation reinforce each other's effectiveness during optimization.
4. Optimization Algorithm and Convergence
The optimization problem is managed via block-coordinate descent, iteratively updating:
- : Solved via generalized Sylvester equations,
with , , dependent on current , similarity matrices, and observed data; solved with BiCG iterative solvers.
- : Each row by a convex quadratic program (simplex-projected); closed-form via KKT conditions.
- : Minimization of trace form with orthonormality constraint; solved via eigendecomposition:
with smallest eigenvectors, and adaptive weights .
- : Updated using the cross-view neighborhood fusion formula.
- : Updated in closed form to minimize the overall objective, with
where is the current view’s in-objective loss.
Empirical convergence is achieved in –$15$ outer iterations. Each subproblem is convex when the others are fixed, ensuring monotonic decrease of the objective. The per-iteration complexity—aggregated over all views—is
where is BiCG iterations, is the number of nonzeros in the Kronecker system, and is the -NN size (Cai et al., 17 Dec 2025).
5. Empirical Evaluation
JUICE has been evaluated on eight real-world multi-view benchmarks (Yale, MSRC-V1, COIL20, HandWritten, BDGP, CCV, USPS, ALOI) with missing rates from to . Each benchmark contains multiple views, hundreds to thousands of samples, and varying feature dimensions and class counts. Instance and feature selection ratios were varied (–). ACC and F1 scores were measured across $30$ random runs.
Key results include:
- On MSRC-V1 at missingness and selection rates, JUICE attains ACC of , compared to for the nearest competitor.
- Across datasets and varying missingness/selection rates, JUICE consistently yields – higher ACC/F1 than single-view (UFI, DFIS, sCOs2) and combination-based methods (C2IN, TERN, TIMC, UKMC, UIMD, SCMD).
- Robustness is maintained under high missingness; t-SNE plots demonstrate superior cluster separation for selected samples.
- Ablation confirms that both cross-view fusion and adaptive imputation are necessary: removing either component (JUICE–I or JUICE–II) reduces ACC/F1 by $5$–.
- Hyperparameter sensitivity is moderate; stable performance for , (Cai et al., 17 Dec 2025).
6. Theoretical and Methodological Insights
The JUICE framework is characterized by several key advances:
- First unified framework for simultaneous unsupervised multi-view feature and instance co-selection with joint missing data recovery.
- Cross-view neighborhood fusion: By propagating imputation through fused -NN graphs, JUICE exploits complementary information and improves robustness to missingness.
- Adaptive view weighting: The optimization dynamically shifts importance to the most informative and best-reconstructed views.
Each subproblem’s convexity underpins stable convergence, while coupling selection and imputation yields improved representation of both feature and instance structure. However, the reliance on iterative Sylvester and eigenvalue solvers poses challenges for extremely high-dimensional settings; parameter selection remains manual; scaling to ultra-large or would require stochastic or parallel extensions.
7. Applications, Limitations, and Future Directions
JUICE is applicable to any multimodal dataset with partial observations, including:
- Environmental sensor fusion (e.g., missing sensor readings)
- Multimedia retrieval (audio–video data streams)
- Bioinformatics (multi-omics with incomplete assays)
- Clinical analytics (multi-modality medical records)
Limitations include computational expense for very high-dimensional or extremely large datasets and a need for further automation of hyperparameter selection. A plausible implication is that stochastic or deep learning-based instantiations could further improve scalability and automaticity in large-scale settings. Future research may target these avenues (Cai et al., 17 Dec 2025).