Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching (2303.14969v1)

Published 27 Mar 2023 in cs.CV and cs.AI

Abstract: Dense prediction tasks are a fundamental class of problems in computer vision. As supervised methods suffer from high pixel-wise labeling cost, a few-shot learning solution that can learn any dense task from a few labeled images is desired. Yet, current few-shot learning methods target a restricted set of tasks such as semantic segmentation, presumably due to challenges in designing a general and unified model that is able to flexibly and efficiently adapt to arbitrary tasks of unseen semantics. We propose Visual Token Matching (VTM), a universal few-shot learner for arbitrary dense prediction tasks. It employs non-parametric matching on patch-level embedded tokens of images and labels that encapsulates all tasks. Also, VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm. We implement VTM as a powerful hierarchical encoder-decoder architecture involving ViT backbones where token matching is performed at multiple feature hierarchies. We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks. Surprisingly, it is competitive with fully supervised baselines using only 10 labeled examples of novel tasks (0.004% of full supervision) and sometimes outperforms using 0.1% of full supervision. Codes are available at https://github.com/GitGyun/visual_token_matching.

Authors (5)

Donggyun Kim (14 papers)
Jinwoo Kim (40 papers)
Seongwoong Cho (4 papers)
Chong Luo (58 papers)
Seunghoon Hong (41 papers)

Citations (19)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching (2303.14969v1)

Summary

Related Papers