Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
Gemini 2.5 Pro Pro
o3 Pro
GPT-4.1 Pro
DeepSeek R1 via Azure Pro
2000 character limit reached

Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval (2405.15451v1)

Published 24 May 2024 in cs.CV, cs.IR, and cs.MM

Abstract: In the domain of language-based fashion image retrieval, pinpointing the desired fashion item using both a reference image and its accompanying textual description is an intriguing challenge. Existing approaches lean heavily on static fusion techniques, intertwining image and text. Despite their commendable advancements, these approaches are still limited by a deficiency in flexibility. In response, we propose a Self-distilled Dynamic Fusion Network to compose the multi-granularity features dynamically by considering the consistency of routing path and modality-specific information simultaneously. Two new modules are included in our proposed method: (1) Dynamic Fusion Network with Modality Specific Routers. The dynamic network enables a flexible determination of the routing for each reference image and modification text, taking into account their distinct semantics and distributions. (2) Self Path Distillation Loss. A stable path decision for queries benefits the optimization of feature extraction as well as routing, and we approach this by progressively refine the path decision with previous path information. Extensive experiments demonstrate the effectiveness of our proposed model compared to existing methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. “Study on fashion image retrieval methods for efficient fashion visual search,” in Proc. IEEE Conf. CVPR Workshop, 2019, pp. 316–319.
  2. “Fashionvlp: Vision language transformer for fashion retrieval with feedback,” in Proc. IEEE Conf. CVPR, 2022, pp. 14085–14095.
  3. “Composing text and image for image retrieval - an empirical odyssey,” in Proc. IEEE Conf. CVPR, 2019, pp. 6439–6448.
  4. “Large-scale image retrieval with attentive deep local features,” in Proc. IEEE ICCV, 2017, pp. 3476–3485.
  5. “Fashion IQ: A new dataset towards retrieving images by natural language feedback,” in Proc. IEEE Conf. CVPR, 2021, pp. 11307–11317.
  6. “Automatic attribute discovery and characterization from noisy web data,” in Proc. ECCV, 2010, vol. 6311, pp. 663–676.
  7. “Cosmo: Content-style modulation for image retrieval with text feedback,” in Proc. IEEE Conf. CVPR, 2021, pp. 802–812.
  8. “ARTEMIS: attention-based retrieval with text-explicit matching and implicit similarity,” in Proc. ICLR, 2022.
  9. “Comprehensive linguistic-visual composition network for image retrieval,” in Proc. SIGIR, 2021, pp. 1369–1378.
  10. “Film: Visual reasoning with a general conditioning layer,” in Proc. AAAI, 2018, pp. 3942–3951.
  11. “Image search with text feedback by visiolinguistic attention learning,” in Proc. IEEE Conf. CVPR, 2020, pp. 2998–3008.
  12. “Modality-agnostic attention fusion for visual search with text feedback,” ArXiv, vol. abs/2007.00145, 2020.
  13. “Image search with text feedback by deep hierarchical attention mutual information maximization,” in Proc. ACM Multimedia, 2021, pp. 4600–4609.
  14. “Heterogeneous feature fusion and cross-modal alignment for composed image retrieval,” in Proc. ACM Multimedia, 2021, pp. 5353–5362.
  15. “SAC: semantic attention composition for text-conditioned image retrieval,” in Proc. IEEE WACV, 2022, pp. 597–606.
  16. “Dual compositional learning in interactive image retrieval,” in Proc. AAAI, 2021, pp. 1771–1779.
  17. “Composed image retrieval via explicit erasure and replenishment with semantic alignment,” IEEE Trans. Image Process., vol. 31, pp. 5976–5988, 2022.
  18. “Amc: Adaptive multi-expert collaborative network for text-guided image retrieval,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 19, no. 6, may 2023.
  19. “Relieving triplet ambiguity: Consensus network for language-guided image retrieval,” ArXiv, vol. abs/2306.02092, 2023.
  20. “Multimodal residual learning for visual QA,” in Proc. NeurIPS, 2016, pp. 361–369.
  21. “Self-training boosted multi-faceted matching network for composed image retrieval,” ArXiv, vol. abs/2305.09979, 2023.
  22. “Multi-modal transformer with global-local alignment for composed query image retrieval,” IEEE Trans. Multimedia, pp. 1–13, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.