A Generalist Framework for Panoptic Segmentation of Images and Videos (2210.06366v4)

Published 12 Oct 2022 in cs.CV, cs.AI, cs.LG, and cs.MM

Abstract: Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image. As permutations of instance IDs are also valid solutions, the task requires learning of high-dimensional one-to-many mapping. As a result, state-of-the-art approaches use customized architectures and task-specific loss functions. We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task. A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function. By simply adding past predictions as a conditioning signal, our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically. With extensive experiments, we demonstrate that our simple approach can perform competitively to state-of-the-art specialist methods in similar settings.

References (71)

Citations (88)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

A Generalist Framework for Panoptic Segmentation of Images and Videos (2210.06366v4)

Summary

Related Papers