Diffusion Models for Open-Vocabulary Segmentation (2306.09316v2)

Published 15 Jun 2023 in cs.CV

Abstract: Open-vocabulary segmentation is the task of segmenting anything that can be named in an image. Recently, large-scale vision-LLMling has led to significant advances in open-vocabulary segmentation, but at the cost of gargantuan and increasing training and annotation efforts. Hence, we ask if it is possible to use existing foundation models to synthesise on-demand efficient segmentation algorithms for specific class sets, making them applicable in an open-vocabulary setting without the need to collect further data, annotations or perform training. To that end, we present OVDiff, a novel method that leverages generative text-to-image diffusion models for unsupervised open-vocabulary segmentation. OVDiff synthesises support image sets for arbitrary textual categories, creating for each a set of prototypes representative of both the category and its surrounding context (background). It relies solely on pre-trained components and outputs the synthesised segmenter directly, without training. Our approach shows strong performance on a range of benchmarks, obtaining a lead of more than 5% over prior work on PASCAL VOC.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (75)

Authors (4)

Laurynas Karazija (7 papers)
Iro Laina (41 papers)
Andrea Vedaldi (195 papers)
Christian Rupprecht (90 papers)

Citations (50)

View on Semantic Scholar

Diffusion Models for Open-Vocabulary Segmentation (2306.09316v2)

Related Papers