Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LiveSketch: Query Perturbations for Guided Sketch-based Visual Search (1904.06611v1)

Published 14 Apr 2019 in cs.CV, cs.AI, and cs.IR

Abstract: LiveSketch is a novel algorithm for searching large image collections using hand-sketched queries. LiveSketch tackles the inherent ambiguity of sketch search by creating visual suggestions that augment the query as it is drawn, making query specification an iterative rather than one-shot process that helps disambiguate users' search intent. Our technical contributions are: a triplet convnet architecture that incorporates an RNN based variational autoencoder to search for images using vector (stroke-based) queries; real-time clustering to identify likely search intents (and so, targets within the search embedding); and the use of backpropagation from those targets to perturb the input stroke sequence, so suggesting alterations to the query in order to guide the search. We show improvements in accuracy and time-to-task over contemporary baselines using a 67M image corpus.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. John Collomosse (52 papers)
  2. Tu Bui (21 papers)
  3. Hailin Jin (53 papers)
Citations (54)

Summary

LiveSketch: Query Perturbations for Guided Sketch-based Visual Search

The paper presents LiveSketch, an innovative approach to sketch-based image retrieval (SBIR), focusing on the ambiguity prevalent in sketch inputs. Traditional SBIR systems struggle when processing free-hand sketches, which often lack completeness and clarity, leading to irrelevant search results. LiveSketch addresses these challenges by dynamically refining the sketch queries through interactive perturbations, thus enhancing the search intent communication process.

The authors introduce a triplet convnet architecture employing a recurrent neural network (RNN) integrated with a variational autoencoder to effectively manage vector (stroke-based) queries. This method distinguishes itself by leveraging RNNs to decipher vector sketches, offering a higher semantic fidelity than raster images by capturing the stroke sequence. Through real-time clustering, the system identifies potential search intents within the search embedding and employs backpropagation from these intents to suggest modifications to the input sketch, guiding the search iteratively.

In contrast to conventional methods where the search relies on a static input, LiveSketch implements an interactive feedback mechanism by which the sketch adapts to the user's preferences expressed through weighted clusters of search results. This dynamic interaction reorients the user's sketches towards sketches resembling the search target, forming a "living sketch" that behaves in sync with user intentions.

The paper illustrates substantial improvements over existing baselines regarding accuracy and operational speed on a corpus of 67 million images. These metrics underscore the system's capability to rapidly direct users toward relevant results, making LiveSketch a suitable candidate for large-scale SBIR applications.

Key Contributions

  1. Vector Queries for Enhanced SBIR: The system constructs an embedding that unifies vector-based sketches and raster images, utilizing RNN and convolutional neural network (CNN) branches. This embedding facilitates the retrieval of photographs and other raster-form content using vector strokes, providing a search framework serving both raster and vector modalities.
  2. Guided Search Intent Elucidation: By clustering search results into semantically diverse groups, the system helps users articulate their search intent more clearly. This mechanism is pivotal when similar structures could pertain to different entities (e.g., a circle among balloons, signs, or mushrooms).
  3. Adversarially-Inspired Query Perturbation: The paper introduces a novel application of adversarial perturbation principles to iteratively refine SBIR sketch queries. By treating the proximity of the queried sketch to identified clusters as a loss and backpropagating through the network, the sketches are dynamically altered to better align with search targets identified by the user.

Implications and Future Directions

LiveSketch holds significant implications for the field of visual data retrieval, particularly in domains requiring rapid interpretation and navigation through extensive image datasets—such as digital art and content libraries. The real-time perturbation mechanism and vector-based querying might streamline workflows in environments where precision and speed are paramount.

The paper sets the stage for future exploration in hybrid query modeling, potentially spurring advancements in multi-modal search embeddings capable of bridging more intricate visual modalities. There is also room to improve the RNN embedding further, particularly for complex sketches with high stroke counts which occasionally produce implausible results.

Overall, LiveSketch is an advanced approach to addressing the intrinsic challenges of SBIR, moving towards a more intuitive, user-guided search experience.

Youtube Logo Streamline Icon: https://streamlinehq.com