Cross-domain Image Retrieval with a Dual Attribute-aware Ranking Network (1505.07922v1)

Published 29 May 2015 in cs.CV

Abstract: We address the problem of cross-domain image retrieval, considering the following practical application: given a user photo depicting a clothing image, our goal is to retrieve the same or attribute-similar clothing items from online shopping stores. This is a challenging problem due to the large discrepancy between online shopping images, usually taken in ideal lighting/pose/background conditions, and user photos captured in uncontrolled conditions. To address this problem, we propose a Dual Attribute-aware Ranking Network (DARN) for retrieval feature learning. More specifically, DARN consists of two sub-networks, one for each domain, whose retrieval feature representations are driven by semantic attribute learning. We show that this attribute-guided learning is a key factor for retrieval accuracy improvement. In addition, to further align with the nature of the retrieval problem, we impose a triplet visual similarity constraint for learning to rank across the two sub-networks. Another contribution of our work is a large-scale dataset which makes the network learning feasible. We exploit customer review websites to crawl a large set of online shopping images and corresponding offline user photos with fine-grained clothing attributes, i.e., around 450,000 online shopping images and about 90,000 exact offline counterpart images of those online ones. All these images are collected from real-world consumer websites reflecting the diversity of the data modality, which makes this dataset unique and rare in the academic community. We extensively evaluate the retrieval performance of networks in different configurations. The top-20 retrieval accuracy is doubled when using the proposed DARN other than the current popular solution using pre-trained CNN features only (0.570 vs. 0.268).

Citations (406)

View on Semantic Scholar

Summary

The paper introduces DARN, a dual-network architecture that integrates semantic attribute learning and triplet ranking loss for improved feature representation.
It utilizes a unique dataset of about 450K online and 90K offline images, enriching model training with fine-grained clothing attributes.
The approach doubles top-20 retrieval accuracy compared to pre-trained CNN features, enhancing cross-domain image matching for e-commerce.

Overview of "Cross-domain Image Retrieval with a Dual Attribute-aware Ranking Network"

This paper addresses the intricacies of cross-domain image retrieval, particularly focusing on retrieving clothing items from online stores using user-submitted photos that typically portray clothing under diverse conditions. The paper introduces the Dual Attribute-aware Ranking Network (DARN), a novel architecture designed to improve retrieval feature learning by leveraging semantic attributes and implementing a ranking constraint system.

Key Contributions

The fundamental contributions of the research are threefold:

Dual Attribute-aware Ranking Network (DARN): The proposed network architecture comprises two sub-networks that are attribute-aware and incorporate the triplet ranking loss to enhance feature representation. Each sub-network is responsible for modeling different domains, thereby addressing the visual discrepancies between online shopping images and user photos.
Unique Dataset: A critical component of this research is the introduction of a large and diverse dataset comprising around 450,000 online and 90,000 offline images with associated fine-grained clothing attributes. This dataset is collected from real-world consumer websites and forms a significant resource for training and testing retrieval models.
Improved Retrieval Accuracy: The DARN approach demonstrates a substantial increase in retrieval accuracy, doubling the top-20 retrieval accuracy compared to pre-trained CNN features (0.570 vs. 0.268).

Technical Approach

The DARN consists of a dual-network structure where each sub-network is dedicated to a specific domain: online shopping and user-uploaded photos. These sub-networks are further enhanced by tree-structured layers that handle semantic attribute learning. The inclusion of fine-grained attributes in the network allows DARN to produce more powerful semantic representations. A triplet ranking loss function is employed to drive feature learning, enforcing that the distance between feature representations of the online-offline image pairs is minimized compared to dissimilar pairs.

An additional component of their approach is the use of a modified R-CNN framework for clothing detection to mitigate the effect of cluttered backgrounds prevalent in user-uploaded photos. The enhanced detection improves the overall retrieval system.

Implications and Future Directions

The implications for e-commerce applications are evident, as effective cross-domain image retrieval can enhance online shopping experiences by accurately matching user-uploaded images with exact or similar products available online. This work paves the way for advancements in AI-driven image retrieval systems by demonstrating how structured domain-specific networks augmented with semantic information can significantly improve retrieval outcomes.

Future exploration may involve scaling DARN to other domains beyond clothing, experimenting with more complex attribute representations, and refining the retrieval network's architecture for increased generalization capabilities.

In conclusion, the research presents a robust framework for cross-domain image retrieval, demonstrating the potential of integrating semantic attribute learning and ranking constraints to bridge domain gaps effectively. The extensive dataset introduced will likely fuel further research in this field, significantly impacting both academic studies and practical applications.

PDF Markdown