Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization (2007.09316v1)

Published 18 Jul 2020 in cs.CV

Abstract: The generalization capability of neural networks across domains is crucial for real-world applications. We argue that a generalized object recognition system should well understand the relationships among different images and also the images themselves at the same time. To this end, we present a new domain generalization framework that learns how to generalize across domains simultaneously from extrinsic relationship supervision and intrinsic self-supervision for images from multi-source domains. To be specific, we formulate our framework with feature embedding using a multi-task learning paradigm. Besides conducting the common supervised recognition task, we seamlessly integrate a momentum metric learning task and a self-supervised auxiliary task to collectively utilize the extrinsic supervision and intrinsic supervision. Also, we develop an effective momentum metric learning scheme with K-hard negative mining to boost the network to capture image relationship for domain generalization. We demonstrate the effectiveness of our approach on two standard object recognition benchmarks VLCS and PACS, and show that our methods achieve state-of-the-art performance.

Authors (5)

Shujun Wang (46 papers)
Lequan Yu (89 papers)
Caizi Li (3 papers)
Chi-Wing Fu (104 papers)
Pheng-Ann Heng (196 papers)

Citations (180)

View on Semantic Scholar

Summary

Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization

The paper "Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization" presents a novel framework, referred to as EISNet, which aims to enhance the generalization capabilities of neural networks across varying domains. The focus of the paper is on domain generalization, which is a critical aspect of deploying neural networks in real-world applications where training and testing data often come from different domains or distributions.

The authors propose a unique approach by simultaneously leveraging extrinsic relationship supervision and intrinsic self-supervision to improve domain generalization. The motivation behind this methodology is the hypothesis that an object recognition system should be proficient in understanding both inter-image relationships (extrinsic supervision) and details within each image (intrinsic supervision).

EISNet is formulated using a multi-task learning paradigm that encompasses the following components:

Momentum Metric Learning: A key component, this involves a triplet loss design with a $K$ -hard negative mining strategy, which aids the network in learning domain-independent yet category-specific features. This extrinsic supervision aims to ensure that features of samples with identical labels are clustered closely while those from different labels are more distinct.
Self-supervised Auxiliary Task: For intrinsic supervision, the authors utilize a jigsaw puzzle solving task within each image, enhancing the network's understanding through spatial prediction tasks. This task involves predicting the order of patches within an image, thus encouraging the network to capture more meaningful representations.

The effectiveness of EISNet is demonstrated through comprehensive experiments on two standard benchmarks: VLCS and PACS. The proposed framework achieved state-of-the-art performance, validating its capacity to improve generalization on unseen target domains.

A rigorous comparison with several existing methods underlined EISNet's superior performance. On the VLCS dataset, the framework achieved an average accuracy of 74.67%, outperforming previous approaches. Similarly, on the more challenging PACS dataset, significant performance improvements were observed, with EISNet achieving average accuracies of 75.86% with AlexNet and 85.84% with ResNet-50 backbones, further demonstrating its robustness across different architectures.

The implications of this research are significant both practically and theoretically. Practically, EISNet can be readily applied to various computer vision tasks where domain discrepancies pose a challenge, such as autonomous driving or medical imaging. Theoretically, the dual-supervision design could inspire new ways to approach feature learning with emphasis on relationships both within and across images.

Looking toward future developments, one prospect is integrating more sophisticated self-supervision tasks and exploring alternative metric learning strategies tailored for domain generalization. Another area worth exploring is the extension of this framework to other modalities of data beyond images, such as language or auditory data, where different domain discrepancies also exist.

In conclusion, the work presented in this paper makes a significant contribution to the field of domain generalization by introducing a powerful and flexible framework that outperforms existing methods, paving the way for more versatile neural network applications.

PDF Markdown

Related Papers

Find Related Papers