Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models (2209.15639v2)

Published 30 Sep 2022 in cs.CV

Abstract: We present F-VLM, a simple open-vocabulary object detection method built upon Frozen Vision and LLMs. F-VLM simplifies the current multi-stage training pipeline by eliminating the need for knowledge distillation or detection-tailored pretraining. Surprisingly, we observe that a frozen VLM: 1) retains the locality-sensitive features necessary for detection, and 2) is a strong region classifier. We finetune only the detector head and combine the detector and VLM outputs for each region at inference time. F-VLM shows compelling scaling behavior and achieves +6.5 mask AP improvement over the previous state of the art on novel categories of LVIS open-vocabulary detection benchmark. In addition, we demonstrate very competitive results on COCO open-vocabulary detection benchmark and cross-dataset transfer detection, in addition to significant training speed-up and compute savings. Code will be released at the https://sites.google.com/view/f-vlm/home

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Weicheng Kuo (23 papers)
  2. Yin Cui (45 papers)
  3. Xiuye Gu (17 papers)
  4. AJ Piergiovanni (40 papers)
  5. Anelia Angelova (61 papers)
Citations (110)