Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
95 tokens/sec
Gemini 2.5 Pro Premium
52 tokens/sec
GPT-5 Medium
20 tokens/sec
GPT-5 High Premium
28 tokens/sec
GPT-4o
100 tokens/sec
DeepSeek R1 via Azure Premium
98 tokens/sec
GPT OSS 120B via Groq Premium
459 tokens/sec
Kimi K2 via Groq Premium
197 tokens/sec
2000 character limit reached

Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation (2403.04503v2)

Published 7 Mar 2024 in cs.IR

Abstract: In this work, we introduce Ducho 2.0, the latest stable version of our framework. Differently from Ducho, Ducho 2.0 offers a more personalized user experience with the definition and import of custom extraction models fine-tuned on specific tasks and datasets. Moreover, the new version is capable of extracting and processing features through multimodal-by-design large models. Notably, all these new features are supported by optimized data loading and storing to the local memory. To showcase the capabilities of Ducho 2.0, we demonstrate a complete multimodal recommendation pipeline, from the extraction/processing to the final recommendation. The idea is to provide practitioners and experienced scholars with a ready-to-use tool that, put on top of any multimodal recommendation framework, may permit them to run extensive benchmarking analyses. All materials are accessible at: \url{https://github.com/sisinflab/Ducho}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. Elliot: A Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation. In SIGIR. ACM, 2405–2414.
  2. Ruining He and Julian J. McAuley. 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In AAAI. AAAI Press, 144–150.
  3. MMFashion: An Open-Source Toolbox for Visual Fashion Analysis. In ACM Multimedia. ACM, 3755–3758.
  4. Formalizing Multimedia Recommendation through Multimodal Deep Learning. CoRR abs/2309.05273 (2023).
  5. Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation. In ACM Multimedia. ACM, 9668–9671.
  6. Learning Transferable Visual Models From Natural Language Supervision. In ICML (Proceedings of Machine Learning Research, Vol. 139). PMLR, 8748–8763.
  7. Large Multi-modal Encoders for Recommendation. CoRR abs/2310.20343 (2023).
  8. Mining Latent Structures for Multimedia Recommendation. In ACM Multimedia. ACM, 3872–3880.
  9. Xin Zhou and Zhiqi Shen. 2023. A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation. In ACM Multimedia. ACM, 935–943.
  10. Bootstrap Latent Representations for Multi-modal Recommendation. In WWW. ACM, 845–854.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com