Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape Recognition (2109.01291v2)

Published 3 Sep 2021 in cs.CV

Abstract: Recently, 3D shape understanding has achieved significant progress due to the advances of deep learning models on various data formats like images, voxels, and point clouds. Among them, point clouds and multi-view images are two complementary modalities of 3D objects and learning representations by fusing both of them has been proven to be fairly effective. While prior works typically focus on exploiting global features of the two modalities, herein we argue that more discriminative features can be derived by modeling ``where to fuse''. To investigate this, we propose a novel Locality-Aware Point-View Fusion Transformer (LATFormer) for 3D shape retrieval and classification. The core component of LATFormer is a module named Locality-Aware Fusion (LAF) which integrates the local features of correlated regions across the two modalities based on the co-occurrence scores. We further propose to filter out scores with low values to obtain salient local co-occurring regions, which reduces redundancy for the fusion process. In our LATFormer, we utilize the LAF module to fuse the multi-scale features of the two modalities both bidirectionally and hierarchically to obtain more informative features. Comprehensive experiments on four popular 3D shape benchmarks covering 3D object retrieval and classification validate its effectiveness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xinwei He (25 papers)
  2. Silin Cheng (4 papers)
  3. Dingkang Liang (37 papers)
  4. Song Bai (87 papers)
  5. Xi Wang (275 papers)
  6. Yingying Zhu (39 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.