MeT: A Graph Transformer for Semantic Segmentation of 3D Meshes (2307.01115v1)

Published 3 Jul 2023 in cs.CV and cs.GR

Abstract: Polygonal meshes have become the standard for discretely approximating 3D shapes, thanks to their efficiency and high flexibility in capturing non-uniform shapes. This non-uniformity, however, leads to irregularity in the mesh structure, making tasks like segmentation of 3D meshes particularly challenging. Semantic segmentation of 3D mesh has been typically addressed through CNN-based approaches, leading to good accuracy. Recently, transformers have gained enough momentum both in NLP and computer vision fields, achieving performance at least on par with CNN models, supporting the long-sought architecture universalism. Following this trend, we propose a transformer-based method for semantic segmentation of 3D mesh motivated by a better modeling of the graph structure of meshes, by means of global attention mechanisms. In order to address the limitations of standard transformer architectures in modeling relative positions of non-sequential data, as in the case of 3D meshes, as well as in capturing the local context, we perform positional encoding by means the Laplacian eigenvectors of the adjacency matrix, replacing the traditional sinusoidal positional encodings, and by introducing clustering-based features into the self-attention and cross-attention operators. Experimental results, carried out on three sets of the Shape COSEG Dataset, on the human segmentation dataset proposed in Maron et al., 2017 and on the ShapeNet benchmark, show how the proposed approach yields state-of-the-art performance on semantic segmentation of 3D meshes.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a transformer-based approach for semantic segmentation of 3D meshes using global attention, Laplacian eigenvector encoding, and clustering-based features.
The method addresses key challenges in processing irregular 3D mesh structures by accurately modeling spatial relationships and capturing local contextual details.
Experimental results on multiple benchmark datasets confirm state-of-the-art performance, highlighting its superiority over conventional methods.

The paper "MeT: A Graph Transformer for Semantic Segmentation of 3D Meshes" introduces a novel transformer-based approach for the semantic segmentation of 3D meshes. Unlike regular grid-like structures often used in computer vision, 3D meshes are complex and irregular, making traditional methods like CNNs suboptimal for achieving high accuracy in segmentation tasks.

The authors identify the inherent irregularity in 3D mesh structures as a significant challenge and highlight the limitations of standard transformer architectures in modeling such non-sequential data. Conventional transformers often struggle with tasks requiring an accurate understanding of relative positions within non-linear structures, like 3D meshes. To address these issues, the proposed method introduces several key innovations:

Global Attention Mechanisms: Leveraging the strength of transformers in capturing long-range dependencies, the method uses global attention to better model the graph structure of 3D meshes.
Positional Encoding: The paper replaces the traditional sinusoidal positional encodings used in transformers with Laplacian eigenvectors of the adjacency matrix. This approach allows for more accurate encoding of the spatial information inherent in 3D meshes.
Clustering-Based Features: The self-attention and cross-attention mechanisms are enhanced with clustering-based features to capture local context more effectively. This addressing the need for local information while still benefiting from the global attention capabilities of transformers.

The experimental evaluation of the proposed method demonstrates its effectiveness:

The approach was tested on three sets of the Shape COSEG Dataset, the human segmentation dataset from Maron et al., 2017, and the ShapeNet benchmark.
Across these datasets, the proposed method achieved state-of-the-art performance, suggesting the superiority of transformer-based architectures in handling the complex structure of 3D meshes for semantic segmentation tasks.

By introducing Laplacian eigenvector-based positional encodings and clustering-based features, the authors address the critical issues of positional modeling and local context capture. These advancements result in a graph transformer model that excels in segmenting 3D mesh data, marking a significant contribution to the field.

PDF Markdown

MeT: A Graph Transformer for Semantic Segmentation of 3D Meshes (2307.01115v1)

Summary

Related Papers