Align and Distill: Unifying and Improving Domain Adaptive Object Detection (2403.12029v4)

Published 18 Mar 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Object detectors often perform poorly on data that differs from their training set. Domain adaptive object detection (DAOD) methods have recently demonstrated strong results on addressing this challenge. Unfortunately, we identify systemic benchmarking pitfalls that call past results into question and hamper further progress: (a) Overestimation of performance due to underpowered baselines, (b) Inconsistent implementation practices preventing transparent comparisons of methods, and (c) Lack of generality due to outdated backbones and lack of diversity in benchmarks. We address these problems by introducing: (1) A unified benchmarking and implementation framework, Align and Distill (ALDI), enabling comparison of DAOD methods and supporting future development, (2) A fair and modern training and evaluation protocol for DAOD that addresses benchmarking pitfalls, (3) A new DAOD benchmark dataset, CFC-DAOD, enabling evaluation on diverse real-world data, and (4) A new method, ALDI++, that achieves state-of-the-art results by a large margin. ALDI++ outperforms the previous state-of-the-art by +3.5 AP50 on Cityscapes to Foggy Cityscapes, +5.7 AP50 on Sim10k to Cityscapes (where ours is the only method to outperform a fair baseline), and +0.6 AP50 on CFC Kenai to Channel. ALDI and ALDI++ are architecture-agnostic, setting a new state-of-the-art for YOLO and DETR-based DAOD as well without additional hyperparameter tuning. Our framework, dataset, and state-of-the-art method offer a critical reset for DAOD and provide a strong foundation for future research. Code and data are available: https://github.com/justinkay/aldi and https://github.com/visipedia/caltech-fish-counting.

Citations (2)

View on Semantic Scholar

Summary

The paper presents the ALDI framework, unifying benchmark protocols and introducing ALDI++ for realistic performance gains across various DAOD datasets.
The study exposes benchmarking pitfalls by comparing equivalent source-only and oracle models, highlighting that true DAOD improvements are more modest than reported.
The approach sets a new standard for transparent and fair evaluation, driving the development of more robust and generalizable domain adaptive object detection methods.

Unifying Domain Adaptive Object Detection with Align and Distill

Introduction

The field of object detection has seen notable advancements, significantly boosting performance across a multitude of benchmark datasets. However, a persistent challenge remains: the degradation of detector performance when confronted with data that deviates from the distribution of the training set. Domain Adaptive Object Detection (DAOD) has emerged as a potent solution to this issue, leveraging unsupervised techniques to diminish the performance drop caused by domain shifts. Notwithstanding the reported success of recent DAOD methods, our meticulous analysis unveils several benchmarking pitfalls that not only inflate the perceived advancements but also obscure the genuine progress in this field. These pitfalls include overestimated performances resulting from suboptimal baselines, inconsistent implementation practices that hinder transparent method comparisons, and an overemphasis on outmoded architectural choices and narrowly scoped benchmarks, which casts doubt on the general applicability of reported advancements. In response to these challenges, we propose "Align and Distill" (ALDI), a framework that not only facilitates a fair and transparent comparison of DAOD methods but also introduces a comprehensive benchmarking protocol aimed at setting realistic performance targets.

Unified Benchmarking and Implementation Framework

ALDI serves as a cohesive framework that integrates common components identified across previous DAOD methods, thus paving the way for direct and fair method comparison. This framework encompasses a novel benchmarking dataset, CFC-DAOD, tailored to evaluate DAOD methods on diverse real-world data. Moreover, ALDI introduces a new DAOD method, ALDI++, which substantially outshines the existing state-of-the-art results across several DAOD benchmarks. Specifically, ALDI++ demonstrates remarkable performance improvements on the Cityscapes to Foggy Cityscapes, Sim10k to Cityscapes, and CFC Kenai to Channel benchmarks.

Important Findings and Implications

Our investigation reveals that when source-only and oracle models are constructed with equivalent non-adaptive components to DAOD methods, the previously reported performance leaps notably recede. This finding is crucial, as it uncovers that the actual improvements attributable to DAOD are more modest than previously believed. Align and Distill’s approach of unified and transparent method comparison provides concrete evidence that recent DAOD methods, although beneficial, have not fully outperformed updated oracle models as previously suggested. Notably, ALDI++ manages to achieve substantial performance gains across various benchmarks while maintaining close proximity to the oracle performance levels, which underlines the effectiveness of our proposed method amid the identified benchmarking pitfalls.

Look Towards the Future

The introduction of the ALDI framework and ALDI++ method marks a significant milestone in the evolution of domain adaptive object detection. By addressing the identified benchmarking pitfalls and setting a new foundation for future research, ALDI encourages the development of more robust and generalizable DAOD methods. The findings and contributions presented in this paper not only underscore the necessity for transparent and fair benchmarking practices but also highlight the potential for innovative approaches to tackle domain shifts in object detection tasks effectively. As we move forward, the ALDI framework stands as a testament to the importance of methodological rigor and the continuous quest for advancements in the DAOD landscape.