Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mutual Query Network for Multi-Modal Product Image Segmentation (2306.14399v1)

Published 26 Jun 2023 in cs.CV

Abstract: Product image segmentation is vital in e-commerce. Most existing methods extract the product image foreground only based on the visual modality, making it difficult to distinguish irrelevant products. As product titles contain abundant appearance information and provide complementary cues for product image segmentation, we propose a mutual query network to segment products based on both visual and linguistic modalities. First, we design a language query vision module to obtain the response of language description in image areas, thus aligning the visual and linguistic representations across modalities. Then, a vision query language module utilizes the correlation between visual and linguistic modalities to filter the product title and effectively suppress the content irrelevant to the vision in the title. To promote the research in this field, we also construct a Multi-Modal Product Segmentation dataset (MMPS), which contains 30,000 images and corresponding titles. The proposed method significantly outperforms the state-of-the-art methods on MMPS.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yun Guo (26 papers)
  2. Wei Feng (208 papers)
  3. Zheng Zhang (488 papers)
  4. Xiancong Ren (2 papers)
  5. Yaoyu Li (8 papers)
  6. Jingjing Lv (9 papers)
  7. Xin Zhu (38 papers)
  8. Zhangang Lin (26 papers)
  9. Jingping Shao (23 papers)