Domain Generalization by Solving Jigsaw Puzzles (1903.06864v2)

Published 16 Mar 2019 in cs.CV and cs.LG

Abstract: Human adaptability relies crucially on the ability to learn and merge knowledge both from supervised and unsupervised learning: the parents point out few important concepts, but then the children fill in the gaps on their own. This is particularly effective, because supervised learning can never be exhaustive and thus learning autonomously allows to discover invariances and regularities that help to generalize. In this paper we propose to apply a similar approach to the task of object recognition across domains: our model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals how to solve a jigsaw puzzle on the same images. This secondary task helps the network to learn the concepts of spatial correlation while acting as a regularizer for the classification task. Multiple experiments on the PACS, VLCS, Office-Home and digits datasets confirm our intuition and show that this simple method outperforms previous domain generalization and adaptation solutions. An ablation study further illustrates the inner workings of our approach.

Authors (5)

Fabio Maria Carlucci (13 papers)
Antonio D'Innocente (8 papers)
Silvia Bucci (18 papers)
Barbara Caputo (105 papers)
Tatiana Tommasi (50 papers)

Citations (764)

View on Semantic Scholar

Summary

Domain Generalization by Solving Jigsaw Puzzles: An Essay

The paper "Domain Generalization by Solving Jigsaw Puzzles" by Carlucci et al. introduces an innovative method for enhancing the generalization capability of object recognition systems across different visual domains by incorporating a self-supervised learning task—solving jigsaw puzzles. The authors propose that the combination of supervised learning with intrinsic self-supervised tasks can yield better generalization, mirroring the way humans learn from both guided and autonomous experiences.

Overview and Methodology

The primary objective of this research is to address domain generalization (DG), wherein the model trained on multiple source domains must generalize well to an unseen target domain. Existing methods in this area often struggle to balance domain invariance and task-specific adaptation without target data during training. Carlucci et al. tackle this issue by integrating a secondary self-supervised task—solving a jigsaw puzzle—with the primary supervised task of object classification. The hypothesis is that this combination allows the model to capture domain-agnostic visual features, enhancing generalization.

Specifically, the authors use a convolutional network (CNN) architecture where the network learns to classify objects and simultaneously solves jigsaw puzzles. The images are divided into patches, shuffled according to a predefined set of permutations, and the network is trained to identify the correct permutation along with the object class. This multitask setup acts as a regularizer, embedding spatial correlation knowledge within the network's feature space.

The method is evaluated on four datasets: PACS, VLCS, Office-Home, and a set of digit classification tasks (MNIST to SVHN and MNIST-M). A detailed ablation paper is conducted to understand the effect of various parameters like the number of jigsaw permutations, the grid size for patch division, and the data bias between ordered and shuffled images.

Experimental Results

The experimental results show significant improvements over existing DG methods. Notably, on the PACS dataset, which includes domains like photos, cartoons, and sketches, the proposed method achieves state-of-the-art performance with Alexnet and competitive results with Resnet-18 architectures. The average accuracy improved from 71.52\% (Deep All baseline) to 73.38\% on Alexnet and comparable gains on Resnet-18. Similarly, the performance gain is evident in the VLCS and Office-Home datasets.

The paper also explores single-source domain generalization by training on MNIST and evaluating on MNIST-M and SVHN. Here, the method consistently outperforms a competitive adversarial data augmentation baseline, showing that the proposed multitask learning approach is robust even with limited domain diversity in the training data.

Implications and Future Directions

The key takeaway from this research is the demonstration that integrating auxiliary self-supervised tasks such as solving jigsaw puzzles within a supervised learning framework significantly enhances DG performance. This has practical implications for deploying vision systems in real-world scenarios where the models must handle diverse and unseen environments.

Moreover, the methodological simplicity of this approach allows for easy integration with various deep learning architectures without the need for complex architectural changes. This adaptability extends the method's applicability across different tasks and domains beyond object classification, including semantic segmentation and instance recognition.

Future research could explore additional self-supervised tasks, such as object rotation classification or contrastive learning, to further improve generalization. It would also be valuable to paper combinations of multiple self-supervised tasks to ascertain their cumulative impact on domain invariance.

In conclusion, Carlucci et al.'s work makes a compelling case for combining supervised and self-supervised learning to enhance model generalization across visual domains. The robust experimental validation and the comprehensive ablation studies reinforce the significance of this approach, setting a strong foundation for future developments in domain generalization.

PDF Markdown