Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations (1807.01394v4)

Published 3 Jul 2018 in cs.CV

Abstract: Understanding clothes from a single image has strong commercial and cultural impacts on modern societies. However, this task remains a challenging computer vision problem due to wide variations in the appearance, style, brand and layering of clothing items. We present a new database called ModaNet, a large-scale collection of images based on Paperdoll dataset. Our dataset provides 55,176 street images, fully annotated with polygons on top of the 1 million weakly annotated street images in Paperdoll. ModaNet aims to provide a technical benchmark to fairly evaluate the progress of applying the latest computer vision techniques that rely on large data for fashion understanding. The rich annotation of the dataset allows to measure the performance of state-of-the-art algorithms for object detection, semantic segmentation and polygon prediction on street fashion images in detail. The polygon-based annotation dataset has been released https://github.com/eBay/modanet, we also host the leaderboard at EvalAI: https://evalai.cloudcv.org/featured-challenges/136/overview.

Citations (133)

Summary

  • The paper presents ModaNet as a comprehensive dataset with detailed pixel-level and polygon annotations across 13 fashion categories.
  • It employs a rigorous selection process combining automated detection and human verification to ensure high-quality, single-person fashion images.
  • Experiments using detectors like Faster RCNN and DeepLabV3+ demonstrate superior performance metrics, highlighting areas for refining small or deformable items.

ModaNet: A Comprehensive Dataset for Street Fashion Analysis

The paper outlines the development and implementation of the ModaNet dataset, introduced as a resource designed specifically for the nuanced domain of street fashion analysis. ModaNet sets itself apart by offering a vast collection of more than 55,000 images, richly annotated with polygonal data. These annotations are of significant utility towards the advancement of state-of-the-art computer vision tasks, such as object detection, semantic segmentation, and polygon prediction, particularly within the fashion industry.

Dataset Features and Methodology

ModaNet builds upon the existing Paperdoll dataset by enhancing its annotations to a higher degree of specificity and scale. A salient feature of ModaNet is its provision of pixel-level masks and detailed polygonal annotations across 13 fashion-related categories. Notably, these categories include items such as bags, belts, boots, footwear, outerwear, dresses, sunglasses, pants, tops, shorts, skirts, headwear, and scarves/ties. Such granularity is instrumental in facilitating advanced model training for diverse computer vision applications.

To compile this dataset, a meticulous selection process was employed. Images were initially filtered to single-person fashion photos and further evaluated for quality through a combination of automated detection models and human verification, ensuring only high-resolution, relevant data were retained for annotation.

Experiments and Results

The paper evaluates several leading-edge detectors and segmentation algorithms, including Faster RCNN, SSD, YOLO, and DeepLabV3+, demonstrating ModaNet's efficacy as a benchmark for fashion-related tasks. The quantitative analysis confirms that the dataset supports superior performance metrics across most evaluated models, reinforcing its value for robust detection and segmentation.

For instance, Faster RCNN achieves commendable mean average precision (mAP) across categories, albeit the research identifies struggles with the detailed detection of smaller or occluded items such as scarves and ties. In terms of semantic segmentation, DeepLabV3+ leads with the highest mean Intersection over Union (IoU) scores, underscoring its architectural advantage. These results spotlight potential areas for algorithmic refinement, particularly in enhancing accuracy for smaller, more deformable items in fashion imagery.

Implications and Future Directions

ModaNet stands to significantly impact both theoretical advancements and practical applications within the domain of fashion technology. Its comprehensive annotations and scale can empower researchers to develop more nuanced models that better handle high variability in real-world fashion contexts. The dataset's unprecedented level of detail and size will likely facilitate designing more sophisticated algorithms and systems capable of enhanced object recognition and seamless segmentation tasks.

Looking forward, continued improvements in handling occlusions and deformable objects could be extrapolated, with possible innovations concentrating on feature extraction techniques and the integration of contextual cues. Additionally, ModaNet could underpin future strides in virtual try-on technologies, personalized fashion recommendations, and visually intelligent fashion search systems.

In summary, the ModaNet dataset is a significant contribution to the field, providing a robust foundation for exploring and improving computer vision applications relevant to fashion and beyond. Its widespread application potential holds promise for numerous advancements across practical, consumer-facing technologies and underlying theoretical frameworks.

Github Logo Streamline Icon: https://streamlinehq.com