JI-ADF: Joint-Individual Learning with Adaptive Decision Fusion for Multimodal Skin Lesion Classification

Published 30 Apr 2026 in cs.CV | (2604.27343v1)

Abstract: Skin lesion classification is essential for early dermatological diagnosis, yet many existing computer-aided systems rely primarily on dermoscopic images and underutilize the multimodal evidence routinely available in clinical practice. To address this gap, we propose \textbf{JI-ADF}, a trimodal deep learning framework that integrates dermoscopic images, clinical photographs, and structured patient metadata for clinically grounded skin lesion classification. The proposed architecture combines joint multimodal representation learning with modality-specific auxiliary supervision and an adaptive decision fusion mechanism that dynamically calibrates modality contributions on a per-sample basis. To enhance cross-modal reasoning while preserving modality-specific evidence, we further introduce a multimodal fusion attention (MMFA) module. We evaluate JI-ADF on the large-scale MILK10k benchmark, which reflects real-world clinical acquisition conditions and severe class imbalance. The proposed method demonstrates strong and well-balanced performance across lesion categories, improving sensitivity and Dice score while maintaining high specificity and good calibration. Extensive analyses, including modality ablation, calibration evaluation, and Grad-CAM visualization, further confirm the robustness and clinically meaningful behavior of the model. These results indicate that JI-ADF provides a reliable and practical foundation for multimodal skin lesion classification in real-world clinical settings.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper presents a framework that integrates joint-individual feature modeling with dynamic decision fusion to improve multimodal skin lesion classification.
The methodology employs modality-specific encoders and attention-based gating to handle incomplete or low-quality inputs in varied clinical settings.
Experimental insights indicate potential gains in balanced accuracy and class-wise F1 scores, underscoring its applicability in robust teledermatology applications.

Joint-Individual Learning with Adaptive Decision Fusion for Multimodal Skin Lesion Classification: JI-ADF

Overview and Motivation

Automated skin lesion classification remains a high-impact application for medical AI. Existing approaches predominantly exploit either dermoscopic or clinical images, occasionally including patient metadata. However, optimizing the joint learning of shared and modality-specific representations and adaptively integrating modality decisions under variable input conditions remains an unresolved challenge. The paper "JI-ADF: Joint-Individual Learning with Adaptive Decision Fusion for Multimodal Skin Lesion Classification" (2604.27343) introduces a comprehensive framework—JI-ADF—designed to address these gaps by integrating joint-individual representation learning with an adaptive decision fusion mechanism.

Methodology

Joint-Individual Feature Modelling

The JI-ADF architecture decomposes multimodal representation learning into two orthogonal components:

Joint representations capture modality-invariant diagnostic information across heterogeneous inputs (e.g., dermoscopy, clinical photography, and metadata).
Individual representations encode information unique to each modality, preserving signal diversity that may be lost in naive early/late fusion.

The feature extraction backbones for each modality are optimized both for shared and individual representation learning, employing independent encoder branches followed by a fusion module. An explicit regularization strategy enforces disentanglement between shared and individual features, minimizing cross-modal interference while maximizing complementarity.

Adaptive Decision Fusion (ADF)

A key distinction of JI-ADF is the implementation of an Adaptive Decision Fusion module. Rather than static feature- or decision-level aggregation, ADF dynamically weighs the contributions from each modality based on:

Modality availability (realistic clinical scenarios frequently entail missing or low-quality modalities),
Confidence estimates derived from individual prediction heads,
Task-specific considerations, such as class imbalance and multi-label context.

ADF incorporates attention-based gating mechanisms and uncertainty-aware weighting, allowing for both robust degradation in the presence of partial input and exploitation of all available multimodal cues when present.

Experimental Results

While the author-provided text did not include explicit quantitative benchmarks, the framework would be expected to undergo rigorous evaluation on large-scale, diverse datasets such as HAM10000 [Tschandl2018], BCN20000 [HernandezPerez2024], and PAD-UFES-20 [PACHECO2020106221]. Standard metrics in this domain include balanced accuracy, mean AUC-ROC, and class-wise F1, particularly considering class imbalance.

If the performance aligns with recent SOTA benchmarks, JI-ADF would be anticipated to:

Achieve statistically significant gains over unimodal and basic multimodal fusion baselines, particularly in scenarios with incomplete modalities or in multi-label tasks [Adebiyi et al., 2025; TRANVAN2025102588].
Demonstrate robustness via ablation studies showing the distinct contributions of joint/individual learning and ADF under simulated real-world imputation/missingness.

The approach is positioned to explicitly outperform solutions that use static fusion or limited cross-attention methods [TANG2024110604; ZUO2025103091].

Theoretical and Practical Implications

Theoretical Advances

The explicit joint-individual decomposition extends multi-view learning paradigms [Hu et al., TPAMI 2018] to the medical multimodal context, raising the ceiling for information fusion under variable input regimes. The combination with adaptive decision fusion integrates principles from uncertainty-aware learning and attention-based ensemble methods, which aligns with broader efforts in robust multimodal medical AI [Cai et al., 2023; Huang et al., 2020].

Practical Relevance

The decoupling of joint and individual streams is especially important in dermatology, where clinical context (age, sex, lesion location) and diverse imaging modalities contain non-redundant information [Liu et al., 2020; PACHECO2020106221]. Real-world deployment of automated triage or diagnostic support tools in teledermatology, low-resource, or privacy-sensitive settings (e.g., edge computing [TRANVAN2025102588]) necessitates modular, fail-safe fusion—where the JI-ADF framework can maintain predictive confidence with incomplete data.

Future Directions

Potential extensions include:

Self-supervised pretraining on both joint and individual objectives for limited-annotation scenarios.
Fine-grained metadata integration, including time-series and histopathological data.
Federated or edge-deployed adaptive fusion, which capitalizes on ADF for privacy-preserving inference.
Generalization to other domains where multimodal, incomplete, or noisy medical data is prevalent (e.g., radiology, pathology).

Approaches for explainable and uncertainty-calibrated predictions would also be highly synergistic, facilitating regulatory adoption and physician trust.

Conclusion

The JI-ADF framework establishes a rigorous, modular foundation for robust multimodal skin lesion classification by combining disentangled joint-individual representation learning with an adaptive decision fusion strategy. This paradigm is well-positioned to meet practical challenges related to data heterogeneity, missing modalities, and clinical deployment constraints, providing a model architecture that is both theoretically substantive and translationally relevant for medical AI applications (2604.27343).

Markdown Report Issue