Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models (2311.14552v3)

Published 24 Nov 2023 in cs.CV and cs.AI

Abstract: Replicating the innate human ability to detect all objects based on free-form texts at any granularity remains a formidable challenge for Large Vision LLMs (LVLMs). Current LVLMs are predominantly constrained to locate a single, pre-existing object. This limitation leads to a compromise in model design, necessitating the introduction of visual expert models or customized head structures. Beyond these constraints, our research uncovers LVLMs' capability for basic object perception, allowing them to accurately identify and locate objects of interest. Building on this insight, we introduce a novel Language-prompted Localization Dataset to fully unleash the capabilities of LVLMs in fine-grained object perception and precise location awareness. More importantly, we present Griffon, a purely LVLM-based baseline, which does not introduce any special tokens, expert models, or additional detection modules. It simply maintains a consistent structure with popular LVLMs by unifying data formats across various localization-related scenarios and is trained end-to-end through a well-designed pipeline. Comprehensive experiments demonstrate that Griffon not only achieves state-of-the-art performance on the fine-grained RefCOCO series and Flickr30K Entities but also approaches the capabilities of the expert model Faster RCNN on the detection benchmark MSCOCO. Data, codes, and models are released at https://github.com/jefferyZhan/Griffon.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yufei Zhan (10 papers)
  2. Yousong Zhu (19 papers)
  3. Zhiyang Chen (27 papers)
  4. Fan Yang (877 papers)
  5. Ming Tang (199 papers)
  6. Jinqiao Wang (76 papers)
Citations (9)
Github Logo Streamline Icon: https://streamlinehq.com