Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Images Don't Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank (1511.06746v1)

Published 20 Nov 2015 in cs.CV and cs.LG

Abstract: Search is at the heart of modern e-commerce. As a result, the task of ranking search results automatically (learning to rank) is a multibillion dollar machine learning problem. Traditional models optimize over a few hand-constructed features based on the item's text. In this paper, we introduce a multimodal learning to rank model that combines these traditional features with visual semantic features transferred from a deep convolutional neural network. In a large scale experiment using data from the online marketplace Etsy, we verify that moving to a multimodal representation significantly improves ranking quality. We show how image features can capture fine-grained style information not available in a text-only representation. In addition, we show concrete examples of how image information can successfully disentangle pairs of highly different items that are ranked similarly by a text-only model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Corey Lynch (18 papers)
  2. Kamelia Aryafar (9 papers)
  3. Josh Attenberg (2 papers)
Citations (45)

Summary

We haven't generated a summary for this paper yet.