Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 82 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 36 tok/s Pro

GPT-5 High 32 tok/s Pro

GPT-4o 110 tok/s Pro

Kimi K2 185 tok/s Pro

GPT OSS 120B 456 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

Disentangling and Generating Modalities for Recommendation in Missing Modality Scenarios (2504.16352v1)

Published 23 Apr 2025 in cs.IR and cs.AI

Abstract: Multi-modal recommender systems (MRSs) have achieved notable success in improving personalization by leveraging diverse modalities such as images, text, and audio. However, two key challenges remain insufficiently addressed: (1) Insufficient consideration of missing modality scenarios and (2) the overlooking of unique characteristics of modality features. These challenges result in significant performance degradation in realistic situations where modalities are missing. To address these issues, we propose Disentangling and Generating Modality Recommender (DGMRec), a novel framework tailored for missing modality scenarios. DGMRec disentangles modality features into general and specific modality features from an information-based perspective, enabling richer representations for recommendation. Building on this, it generates missing modality features by integrating aligned features from other modalities and leveraging user modality preferences. Extensive experiments show that DGMRec consistently outperforms state-of-the-art MRSs in challenging scenarios, including missing modalities and new item settings as well as diverse missing ratios and varying levels of missing modalities. Moreover, DGMRec's generation-based approach enables cross-modal retrieval, a task inapplicable for existing MRSs, highlighting its adaptability and potential for real-world applications. Our code is available at https://github.com/ptkjw1997/DGMRec.

Summary

The paper introduces DGMRec, a framework that disentangles and generates modality features to address missing data in multi-modal recommendations.
It employs distinct modules for separating general and specific modality attributes and generating features for absent modalities.
Empirical results on datasets like Amazon Baby and TikTok show that DGMRec outperforms state-of-the-art systems in challenging missing modality scenarios.

Disentangling and Generating Modalities for Recommendation in Missing Modality Scenarios

Introduction

The paper introduces the Disentangling and Generating Modality Recommender (DGMRec), a multi-modal recommender system (MRS) designed to address two prominent challenges in recommendation settings with missing modalities: the adverse impact of missing modality scenarios and the unique characteristics inherent in each modality. MRSs that leverage diverse information sources, like text and images, tend to suffer performance drops when some modalities are absent (Figure 1). DGMRec effectively disentangles general and specific modality features and generates missing modality features from the available ones, thereby offering robust solutions for real-world recommendation problems in the presence of incomplete data.

Figure 1: (a) Performance drop of recent MRSs when missing modality exists. (b) Difference in recommendation scores with missing modalities.

DGMRec Overview

The DGMRec framework consists of two main modules: The Disentangling Modality Feature module and the Missing Modality Generation module (Figure 2). The first module focuses on separating modality-specific features into general and specific components. This disentanglement is crucial because it acknowledges that different modalities convey unique types of information — while an image may capture visual aesthetics, the text might communicate functional details. The module's design ensures that these distinctive attributes are preserved and individually harnessed for recommendation.

Figure 2: Overview of DGMRec framework, showing its modality disentangling and generation capabilities.

DGMRec's Missing Modality Generation module further differentiates it by generating features for missing modalities using a combination of aligned features from other available modalities and user preference data. This capability allows DGMRec to maintain robust recommendation performance even when multiple modalities are absent.

Experimental Results

The research presents extensive experimental evaluations of DGMRec across several datasets, including the Amazon Baby and TikTok datasets. DGMRec consistently outperforms existing multi-modal recommendation systems and handles diverse realistic settings, including scenarios with new items and varying levels of missing data (Figure 3).

Figure 3: Performance on various missing levels on the Amazon Baby and TikTok datasets.

In scenarios of extreme data sparsity — a common case in real-world applications — DGMRec's ability to generate missing modality features results in measurable performance improvements over both traditional collaborative filtering methods and other SOTA multi-modal systems (Figure 4).

Figure 4: (a) Performance on various missing ratios, and (b) relative performance drop on the Amazon Baby dataset.

Effectiveness of Disentanglement

The paper provides compelling evidence that the method of disentangling modality features is highly effective (Figure 5). By using information-based losses (such as CLUB and InfoNCE), DGMRec ensures that general modality features can align across different data types, while specific features remain distinct but informative. These techniques help prevent the loss of valuable modality-specific information and support accurate generation of missing features.

Figure 5: (a) Visualization of disentangled modality features and (b) similarity score between features during training.

Conclusion

DGMRec makes a significant contribution to multi-modal recommendation research by addressing critical challenges related to missing modalities and unaligned modality characteristics. Its framework offers a strong practical advantage, particularly in environments where data completeness is neither guaranteed nor feasible. The paper demonstrates that a disentangling and generating approach is not only effective in maintaining recommendation performance but also adaptable enough to address a wide range of practical scenarios.

Combining high performance in classic recommendation metrics with a strong capability to function in missing modality regimes highlights DGMRec's potential utility in real-world applications. Future developments may see iterations of DGMRec incorporating broader types of modality data and enhancing cross-modality retrieval capabilities, ensuring it remains relevant in constantly evolving data environments.