Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Language-Based Depth Hints for Monocular Depth Estimation (2403.15551v1)

Published 22 Mar 2024 in cs.CV and cs.AI

Abstract: Monocular depth estimation (MDE) is inherently ambiguous, as a given image may result from many different 3D scenes and vice versa. To resolve this ambiguity, an MDE system must make assumptions about the most likely 3D scenes for a given input. These assumptions can be either explicit or implicit. In this work, we demonstrate the use of natural language as a source of an explicit prior about the structure of the world. The assumption is made that human language encodes the likely distribution in depth-space of various objects. We first show that a LLM encodes this implicit bias during training, and that it can be extracted using a very simple learned approach. We then show that this prediction can be provided as an explicit source of assumption to an MDE system, using an off-the-shelf instance segmentation model that provides the labels used as the input to the LLM. We demonstrate the performance of our method on the NYUD2 dataset, showing improvement compared to the baseline and to random controls.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Dylan Auty (4 papers)
  2. Krystian Mikolajczyk (52 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.