Papers
Topics
Authors
Recent
Search
2000 character limit reached

FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings

Published 17 Aug 2023 in cs.CV | (2308.09012v2)

Abstract: Logo embedding models convert the product logos in images into vectors, enabling their utilization for logo recognition and detection within e-commerce platforms. This facilitates the enforcement of intellectual property rights and enhances product search capabilities. However, current methods treat logo embedding as a purely visual problem. A noteworthy issue is that visual models capture features more than logos. Instead, we view this as a multimodal task, using text as auxiliary information to facilitate the visual model's understanding of the logo. The emerging Multimodal LLMs (MLLMs) have demonstrated remarkable capabilities in both visual and textual understanding. Inspired by this, we propose an approach, \textbf{FashionLOGO}, to explore how to prompt MLLMs to generate appropriate text for product images, which can help visual models achieve better logo embeddings. We adopt a cross-attention transformer block that enables visual embedding to automatically learn supplementary knowledge from textual embedding. Our extensive experiments on real-world datasets prove that FashionLOGO is capable of generating generic and robust logo embeddings, achieving state-of-the-art performance in all benchmarks.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.