Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation) (2108.05888v1)

Published 12 Aug 2021 in cs.CV

Abstract: Multiview detection incorporates multiple camera views to deal with occlusions, and its central problem is multiview aggregation. Given feature map projections from multiple views onto a common ground plane, the state-of-the-art method addresses this problem via convolution, which applies the same calculation regardless of object locations. However, such translation-invariant behaviors might not be the best choice, as object features undergo various projection distortions according to their positions and cameras. In this paper, we propose a novel multiview detector, MVDeTr, that adopts a newly introduced shadow transformer to aggregate multiview information. Unlike convolutions, shadow transformer attends differently at different positions and cameras to deal with various shadow-like distortions. We propose an effective training scheme that includes a new view-coherent data augmentation method, which applies random augmentations while maintaining multiview consistency. On two multiview detection benchmarks, we report new state-of-the-art accuracy with the proposed system. Code is available at https://github.com/hou-yz/MVDeTr.

Citations (45)

Summary

We haven't generated a summary for this paper yet.