Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation (2403.04503v2)
Abstract: In this work, we introduce Ducho 2.0, the latest stable version of our framework. Differently from Ducho, Ducho 2.0 offers a more personalized user experience with the definition and import of custom extraction models fine-tuned on specific tasks and datasets. Moreover, the new version is capable of extracting and processing features through multimodal-by-design large models. Notably, all these new features are supported by optimized data loading and storing to the local memory. To showcase the capabilities of Ducho 2.0, we demonstrate a complete multimodal recommendation pipeline, from the extraction/processing to the final recommendation. The idea is to provide practitioners and experienced scholars with a ready-to-use tool that, put on top of any multimodal recommendation framework, may permit them to run extensive benchmarking analyses. All materials are accessible at: \url{https://github.com/sisinflab/Ducho}.
- Elliot: A Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation. In SIGIR. ACM, 2405–2414.
- Ruining He and Julian J. McAuley. 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In AAAI. AAAI Press, 144–150.
- MMFashion: An Open-Source Toolbox for Visual Fashion Analysis. In ACM Multimedia. ACM, 3755–3758.
- Formalizing Multimedia Recommendation through Multimodal Deep Learning. CoRR abs/2309.05273 (2023).
- Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation. In ACM Multimedia. ACM, 9668–9671.
- Learning Transferable Visual Models From Natural Language Supervision. In ICML (Proceedings of Machine Learning Research, Vol. 139). PMLR, 8748–8763.
- Large Multi-modal Encoders for Recommendation. CoRR abs/2310.20343 (2023).
- Mining Latent Structures for Multimedia Recommendation. In ACM Multimedia. ACM, 3872–3880.
- Xin Zhou and Zhiqi Shen. 2023. A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation. In ACM Multimedia. ACM, 935–943.
- Bootstrap Latent Representations for Multi-modal Recommendation. In WWW. ACM, 845–854.