Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Graph Transformer for Free-Hand Sketch Recognition (1912.11258v3)

Published 24 Dec 2019 in cs.CV and cs.LG

Abstract: Learning meaningful representations of free-hand sketches remains a challenging task given the signal sparsity and the high-level abstraction of sketches. Existing techniques have focused on exploiting either the static nature of sketches with Convolutional Neural Networks (CNNs) or the temporal sequential property with Recurrent Neural Networks (RNNs). In this work, we propose a new representation of sketches as multiple sparsely connected graphs. We design a novel Graph Neural Network (GNN), the Multi-Graph Transformer (MGT), for learning representations of sketches from multiple graphs which simultaneously capture global and local geometric stroke structures, as well as temporal information. We report extensive numerical experiments on a sketch recognition task to demonstrate the performance of the proposed approach. Particularly, MGT applied on 414k sketches from Google QuickDraw: (i) achieves small recognition gap to the CNN-based performance upper bound (72.80% vs. 74.22%), and (ii) outperforms all RNN-based models by a significant margin. To the best of our knowledge, this is the first work proposing to represent sketches as graphs and apply GNNs for sketch recognition. Code and trained models are available at https://github.com/PengBoXiangShang/multigraph_transformer.

Citations (76)

Summary

  • The paper introduces a Multi-Graph Transformer that models free-hand sketches using dual graph structures for intra-stroke and extra-stroke relationships.
  • It leverages Graph Attention Layers to integrate spatial and temporal information, achieving competitive performance on the QuickDraw dataset.
  • The approach reduces inference time while opening avenues for efficient, real-time recognition in dynamic, non-Euclidean domains.

Multi-Graph Transformer for Free-Hand Sketch Recognition

The paper "Multi-Graph Transformer for Free-Hand Sketch Recognition" by Xu, Joshi, and Bresson explores a novel approach to enhancing the recognition of free-hand sketches. Recognizing these sketches presents a unique challenge due to their high level of abstraction and signal sparsity. Traditional methods typically leverage either Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs) to utilize the static or sequential nature of sketches, respectively. However, this research proposes an innovative methodology by representing sketches as multiple sparsely connected graphs and employing a Multi-Graph Transformer (MGT) architecture to model them.

Approach and Methodology

This paper departs from conventional sketch recognition methods by considering each sketch as a set of multiple graphs. The proposed Multi-Graph Transformer (MGT) modifies traditional Graph Neural Networks (GNNs) to learn sketch representations, integrating both global and local geometric stroke structures along with temporal information. The key contribution here is modeling sketches as a combination of intra-stroke and extra-stroke graphs. Intra-stroke graphs capture the internal geometry within strokes, while extra-stroke graphs consider the temporal order of strokes to model their global arrangement.

The use of Graph Attention Layers enables the MGT to efficiently learn from these sparse graph structures. By extending the Transformer architecture to accommodate multiple graphs, the paper introduces a significant paradigm shift in capturing the intricate representation of free-hand sketches. This multi-graph approach not only takes into account the sparse nature and temporal sequence of sketches but also enhances the semantic understanding by introducing sketch-specific inductive biases.

Experimental Results

The authors conducted extensive experiments on a large dataset consisting of 414,000 sketches from the Google QuickDraw dataset. The MGT demonstrated competitive performance with the performance upper bound of CNN-based approaches (72.80% vs. 74.22%) while reducing the inference time and significantly outperforming all RNN-based models. The MGT's ability to infer faster than even the top-performing CNNs, such as the Inception V3, underscores the practical advantages of integrating sketch-specific graph structures in the network design.

Implications and Future Work

The implications of this work are multifaceted. Practically, it suggests a robust framework for real-time sketch recognition systems, reducing computational overhead by bypassing the need to render images and directly processing the sketch coordinates. Theoretically, it opens up new research directions in the neural representation of non-Euclidean data, emphasizing the value of introducing domain-specific graph structures in Transformer architectures.

Future developments could explore expanding this multi-graph approach beyond sketch recognition, potentially applying it to other domains involving dynamic processes and spatial-temporal relationships. The authors also highlight transferring this architecture to relation extraction tasks in NLP, further validating its versatility and efficacy.

In conclusion, this paper presents a thoughtful integration of graph structures with Transformer models, offering a promising avenue for effectively processing and understanding free-hand sketches. The introduction of multiple graphs to incorporate semantic, spatial, and temporal information simultaneously presents an innovative step forward in the field of sketch recognition and graph-based neural networks.