Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Laparoscopic Stereo Matching with Transformers (2207.12152v1)

Published 25 Jul 2022 in cs.CV

Abstract: The self-attention mechanism, successfully employed with the transformer structure is shown promise in many computer vision tasks including image recognition, and object detection. Despite the surge, the use of the transformer for the problem of stereo matching remains relatively unexplored. In this paper, we comprehensively investigate the use of the transformer for the problem of stereo matching, especially for laparoscopic videos, and propose a new hybrid deep stereo matching framework (HybridStereoNet) that combines the best of the CNN and the transformer in a unified design. To be specific, we investigate several ways to introduce transformers to volumetric stereo matching pipelines by analyzing the loss landscape of the designs and in-domain/cross-domain accuracy. Our analysis suggests that employing transformers for feature representation learning, while using CNNs for cost aggregation will lead to faster convergence, higher accuracy and better generalization than other options. Our extensive experiments on Sceneflow, SCARED2019 and dVPN datasets demonstrate the superior performance of our HybridStereoNet.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xuelian Cheng (11 papers)
  2. Yiran Zhong (75 papers)
  3. Mehrtash Harandi (108 papers)
  4. Tom Drummond (70 papers)
  5. Zhiyong Wang (120 papers)
  6. Zongyuan Ge (102 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.