Papers
Topics
Authors
Recent
Search
2000 character limit reached

Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation

Published 7 Mar 2024 in cs.IR | (2403.04503v2)

Abstract: In this work, we introduce Ducho 2.0, the latest stable version of our framework. Differently from Ducho, Ducho 2.0 offers a more personalized user experience with the definition and import of custom extraction models fine-tuned on specific tasks and datasets. Moreover, the new version is capable of extracting and processing features through multimodal-by-design large models. Notably, all these new features are supported by optimized data loading and storing to the local memory. To showcase the capabilities of Ducho 2.0, we demonstrate a complete multimodal recommendation pipeline, from the extraction/processing to the final recommendation. The idea is to provide practitioners and experienced scholars with a ready-to-use tool that, put on top of any multimodal recommendation framework, may permit them to run extensive benchmarking analyses. All materials are accessible at: \url{https://github.com/sisinflab/Ducho}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. Elliot: A Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation. In SIGIR. ACM, 2405–2414.
  2. Ruining He and Julian J. McAuley. 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In AAAI. AAAI Press, 144–150.
  3. MMFashion: An Open-Source Toolbox for Visual Fashion Analysis. In ACM Multimedia. ACM, 3755–3758.
  4. Formalizing Multimedia Recommendation through Multimodal Deep Learning. CoRR abs/2309.05273 (2023).
  5. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation. In ACM Multimedia. ACM, 9668–9671.
  6. Learning Transferable Visual Models From Natural Language Supervision. In ICML (Proceedings of Machine Learning Research, Vol. 139). PMLR, 8748–8763.
  7. Large Multi-modal Encoders for Recommendation. CoRR abs/2310.20343 (2023).
  8. Mining Latent Structures for Multimedia Recommendation. In ACM Multimedia. ACM, 3872–3880.
  9. Xin Zhou and Zhiqi Shen. 2023. A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation. In ACM Multimedia. ACM, 935–943.
  10. Bootstrap Latent Representations for Multi-modal Recommendation. In WWW. ACM, 845–854.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 21 likes about this paper.