Generalization to Non-E-commerce Multimodal Domains

Establish whether the RoDPO (Robust Direct Preference Optimization) framework for multimodal sequential recommendation generalizes beyond Amazon e-commerce datasets to other multimodal domains such as short-video recommendation, by verifying its effectiveness under differing domain characteristics and feedback signals.

Background

RoDPO is evaluated on three Amazon e-commerce benchmarks with multimodal information, demonstrating consistent gains over strong baselines. However, e-commerce clickstream data may differ substantially from other domains such as short-video recommendation, which can involve different content dynamics, exposure mechanisms, and feedback patterns.

The authors explicitly note that their experiments are confined to e-commerce and that it remains unverified whether the proposed alignment approach with stochastic Top-K negative sampling and optional sparse MoE maintains its benefits and robustness in other multimodal settings.

References

First, our evaluation is primarily conducted on e-commerce datasets (Amazon), and the generalization to other multimodal domains (e.g., short-video recommendation) remains to be verified.