Visual Discovery at Pinterest (1702.04680v2)

Published 15 Feb 2017 in cs.CV

Abstract: Over the past three years Pinterest has experimented with several visual search and recommendation services, including Related Pins (2014), Similar Looks (2015), Flashlight (2016) and Lens (2017). This paper presents an overview of our visual discovery engine powering these services, and shares the rationales behind our technical and product decisions such as the use of object detection and interactive user interfaces. We conclude that this visual discovery engine significantly improves engagement in both search and recommendation tasks.

Citations (76)

View on Semantic Scholar

Summary

An Analytical Overview of Visual Discovery at Pinterest

The paper "Visual Discovery at Pinterest" explores the architectural and technical decisions underlying Pinterest’s visual discovery engine - a system designed to enhance visual search and recommendation capabilities. This engine has been integrated into various Pinterest features such as Related Pins, Flashlight, and Lens, facilitating enhanced user engagement through improved image and object recognition.

Technical Advances and Implementation

The paper highlights two significant advances leveraged by the Pinterest team: convolutional neural networks (convnets) and object detection techniques. The authors applied classification networks such as VGG16, ResNet152, and AlexNet to extract visual features that enable superior image retrieval capabilities. Among these models, ResNet152 and VGG16, especially the binarized versions, demonstrated a remarkable balance between performance and resource efficiency, with the fine-tuned VGG16 fc6 binary representation achieving a precision at 1 of 0.169. This precision improvement is crucial, considering Pinterest's scalability requirements with billions of indexed images.

In terms of object detection, Faster R-CNN and Single Shot Detection (SSD) were utilized, each offering distinct advantages in terms of accuracy and processing latency. SSD, noted for its computational efficiency, emerged as particularly valuable in scenarios requiring real-time image processing, such as Pinterest Flashlight, and scaling challenges involving billions of images. The paper emphasizes that the SSD model managed to process object detection in merely 59 ms with respectable precision metrics, thus significantly reducing latency compared to Faster R-CNN.

Application in Pinterest Services

The visual discovery engine refined Pinterest’s recommendation system, notably through Related Pins and Flashlight. Convnet features such as fine-tuned VGG16 significantly improved user engagement by 4.0% after initial trials with 100 million query pins. This improvement was particularly noticeable in visually-driven categories like art and design. Furthermore, cross-feature enhancements considering visual similarity relative to categories offered an additional engagement boost.

Flashlight, a Pinterest feature enabling users to search for specific objects within images, was both user-driven through interactive cropping and automated through real-time object detection. Introducing object detection improved UI design by offering clickable dots over detected objects and added the capability of aggregating user engagement signals more effectively. The paper reports a 4.9% increase in user engagement after tackling initial challenges through revised confidence-ranking techniques integrating visual feature quality.

Forward-Thinking Visual Search

Pinterest Lens marks a proactive step beyond mere visual similarity retrieval, focusing on semantic relevance by blending multiple content sources. Through innovations like object search, the system addresses diverse user interests, providing comprehensive results that blend visual similarity, semantic matching, and contextual suggestions. This approach hints at future advancements in AI, highlighting the potential for visual systems to integrate deeply with semantic understanding and user-context adaptation.

Implications and Research Opportunities

The insights offered by the paper demonstrate the transformative potential of integrating advanced computer vision techniques within commercial applications. This has implications not only for improving user engagement but also for expanding the scope of AI applications in visual discovery. Future advancements may explore models with higher fine-tuning specificity for differing content types or more robust unsupervised learning approaches to further improve retrieval accuracy.

In conclusion, Pinterest’s approach to visual discovery offers a detailed case paper of scalable, efficient, and effective integration of computer vision technology into user-centric applications. While yielding significant engagement improvements, it also underscores opportunities for future research in balancing real-time processing demands against the growing complexity of visual datasets.