Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Domain Generalization by Mutual-Information Regularization with Pre-trained Models (2203.10789v2)

Published 21 Mar 2022 in cs.LG and cs.CV

Abstract: Domain generalization (DG) aims to learn a generalized model to an unseen target domain using only limited source domains. Previous attempts to DG fail to learn domain-invariant representations only from the source domains due to the significant domain shifts between training and test domains. Instead, we re-formulate the DG objective using mutual information with the oracle model, a model generalized to any possible domain. We derive a tractable variational lower bound via approximating the oracle model by a pre-trained model, called Mutual Information Regularization with Oracle (MIRO). Our extensive experiments show that MIRO significantly improves the out-of-distribution performance. Furthermore, our scaling experiments show that the larger the scale of the pre-trained model, the greater the performance improvement of MIRO. Source code is available at https://github.com/kakaobrain/miro.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Junbum Cha (10 papers)
  2. Kyungjae Lee (37 papers)
  3. Sungrae Park (17 papers)
  4. Sanghyuk Chun (49 papers)
Citations (113)

Summary

Domain Generalization via Mutual Information Regularization with Pre-trained Models

The paper "Domain Generalization by Mutual-Information Regularization with Pre-trained Models" addresses a crucial problem in machine learning known as domain generalization (DG). DG aims to equip models with the capability of performing well on unseen target domains by relying solely on multiple source domains for training. Traditional methods have struggled to bridge the gap between training and testing distributions when these significantly differ—a limitation when deploying models in real-world scenarios where such shifts are expected.

The authors introduce Mutual Information Regularization with Oracle (MIRO), a novel approach to the DG problem that reformulates the DG goal via mutual information (MI) maximization. The intent is to train models to learn from an "oracle" representation, which is generalized to any domain. In practice, however, such an oracle is unattainable, so the paper proposes approximating it using a pre-trained model like those trained on large-scale datasets such as ImageNet or extensive weakly-supervised datasets such as Instagram's 3.6B images. The core idea is to enhance the transfer of domain-agnostic knowledge from pre-trained models into the target model by maximizing mutual information between the oracle (approximated by the pre-trained model) and the target representations.

This innovation is encapsulated in the MIRO framework, which incorporates an original task objective (similar to empirical risk minimization) and a regularization term that enforces mutual information between the pre-trained and current model's features. The tractable variational bound for MIRO allows obtaining reliable updates during training, promoting robustness against domain shifts.

Empirical results demonstrate that MIRO significantly improves DG performance across various datasets, including PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet. Notably, when paired with pre-trained models like those from CLIP or SWAG, the framework displays substantial performance gains, indicating that MIRO effectively exploits broad, diverse representations from large pre-trained models. This is especially evident in settings where naive end-to-end fine-tuning of such large models fails to maintain robustness across distribution shifts, a frequent consequence noted in existing literature.

An intriguing aspect of the paper is its exploration of mutual information as a bridge between pre-trained and domain-generalized model learning. The paper empirically shows that MIRO can align model representations closer to that of an oracle by maintaining high MI, a validation against simply fine-tuning pre-trained models, especially large ones. Moreover, MIRO's adaptability to different scales and types of pre-training data (e.g., supervised vs. weakly-supervised) exemplifies its flexibility and robustness.

Beyond theoretical formulation and experimentation, the paper extends its analysis to other related methods such as CRD and LwF, underscoring the comparative strength of MIRO in leveraging pre-trained weights for domain generalization tasks. It sets a new benchmark within the field, achieving state-of-the-art results on standard DG benchmarks and demonstrating the potential of MI regularization in real-world AI applications.

In conclusion, the paper offers an important contribution to the domain generalization landscape by presenting a theoretically driven, empirically validated approach to utilizing pre-trained model knowledge for improved cross-domain performance. This research not only paves the way for exploiting large pre-trained models across unseen domains but also sets a methodological precedence that others can build upon. Future endeavors may further refine MIRO techniques or explore its integration with other advanced neural network paradigms, expanding its applicability and efficacy in ever-evolving AI domains.

Github Logo Streamline Icon: https://streamlinehq.com