Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Buy Me That Look: An Approach for Recommending Similar Fashion Products (2008.11638v2)

Published 26 Aug 2020 in cs.CV and cs.LG

Abstract: Have you ever looked at an Instagram model, or a model in a fashion e-commerce web-page, and thought \textit{"Wish I could get a list of fashion items similar to the ones worn by the model!"}. This is what we address in this paper, where we propose a novel computer vision based technique called \textbf{ShopLook} to address the challenging problem of recommending similar fashion products. The proposed method has been evaluated at Myntra (www.myntra.com), a leading online fashion e-commerce platform. In particular, given a user query and the corresponding Product Display Page (PDP) against the query, the goal of our method is to recommend similar fashion products corresponding to the entire set of fashion articles worn by a model in the PDP full-shot image (the one showing the entire model from head to toe). The novelty and strength of our method lies in its capability to recommend similar articles for all the fashion items worn by the model, in addition to the primary article corresponding to the query. This is not only important to promote cross-sells for boosting revenue, but also for improving customer experience and engagement. In addition, our approach is also capable of recommending similar products for User Generated Content (UGC), eg., fashion article images uploaded by users. Formally, our proposed method consists of the following components (in the same order): i) Human keypoint detection, ii) Pose classification, iii) Article localisation and object detection, along with active learning feedback, and iv) Triplet network based image embedding model.

Citations (9)

Summary

  • The paper introduces ShopLook, which combines human keypoint detection and pose classification to identify detailed full-body images in fashion recommendations.
  • It employs Mask RCNN for object localization and a triplet network for image embeddings to cluster similar fashion items effectively.
  • Experimental results validate that ShopLook enhances retrieval accuracy and boosts cross-selling efficiency on e-commerce platforms.

An Overview of "Buy Me That Look: An Approach for Recommending Similar Fashion Products"

The paper "Buy Me That Look: An Approach for Recommending Similar Fashion Products" presents a novel approach to address the problem of fashion product recommendation using computer vision techniques. The proposed method, ShopLook, is designed to recommend fashion items similar to those worn by a model in an image from a product display page (PDP) on platforms like Myntra. This recommendation system not only identifies and suggests similar products for the primary fashion article of interest but extends to all fashion items worn by the model. As such, it supports cross-selling and enhances customer experience.

Methodology

The approach is structured around four primary components:

  1. Human Keypoint Detection: The initial step involves detecting keypoints on the human body using a computer vision technique by Xiao et al. This helps identify the full-body shot image from various angles or views in the PDP. The presence of specific keypoints like head and ankles is used to determine if an image portrays a complete view of the model.
  2. Pose Classification: To ensure that the identified full-body shot provides a clear view, a pose classifier categorizes the images into front, back, left, right, or detailed shot views. A ResNet18 network aids in this classification process, showing promising precision and recall rates for topwear and bottomwear categories.
  3. Article Localization and Object Detection: Following the identification of the full-body shot, the method utilizes Mask RCNN for object detection and localization of individual fashion articles. Active learning feedback from in-house taggers is incorporated to enhance the detection accuracy. The model achieves an impressive mean Average Precision (mAP), especially for topwear categories.
  4. Triplet Network-based Image Embedding Model: This component focuses on generating embeddings for detected articles using a triplet network. It aligns the embeddings such that similar items are grouped together while dissimilar ones are spaced apart. This enables the system to compute image similarity and recommend analogous fashion items.

Experimental Results

The paper presents a thorough evaluation of the proposed methodology:

  • The pose classification achieves high precision and recall across multiple pose categories.
  • The object detection component registers substantial improvements when integrating active learning for refining model accuracy.
  • In terms of embedding generation, the triplet network's efficacy is validated through qualitative and quantitative tests, outperforming baseline approaches in retrieval accuracy.

The performance of ShopLook extends beyond catalog images to User Generated Content (UGC), showing robust fashion recommendations even for lower resolution or varied images.

Implications and Future Directions

The implementation of ShopLook contributes significantly to enhancing e-commerce platforms by promoting cross-sells and improving user engagement through seamless product recommendations. The end-to-end design also shows potential applications across different multimedia domains, including social media.

Future development can focus on enriching the recommendation system by incorporating specific fashion attributes and potentially utilizing occasion-based filtering. This could refine the relevance and personalization of recommendations, aligning them closely with user preferences.

Conclusion

The paper introduces an effective framework for fashion product recommendation, leveraging state-of-the-art computer vision techniques across human keypoint detection, object localization, and image embedding. Its deployment has shown promising results both in controlled settings and real-world applications, offering substantial benefits to online fashion platforms. The outlined future work indicates a clear direction for enhancing the model's capabilities, paving the way for more sophisticated implementations in fashion technology solutions.

Youtube Logo Streamline Icon: https://streamlinehq.com