Barlow Twins: Self-Supervised Learning via Redundancy Reduction (2103.03230v3)

Published 4 Mar 2021 in cs.CV, cs.AI, cs.LG, and q-bio.NC

Abstract: Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A successful approach to SSL is to learn embeddings which are invariant to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant solutions. Most current methods avoid such solutions by careful implementation details. We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. This causes the embedding vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors. The method is called Barlow Twins, owing to neuroscientist H. Barlow's redundancy-reduction principle applied to a pair of identical networks. Barlow Twins does not require large batches nor asymmetry between the network twins such as a predictor network, gradient stopping, or a moving average on the weight updates. Intriguingly it benefits from very high-dimensional output vectors. Barlow Twins outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime, and is on par with current state of the art for ImageNet classification with a linear classifier head, and for transfer tasks of classification and object detection.

Authors (5)

Jure Zbontar (11 papers)
Li Jing (31 papers)
Ishan Misra (65 papers)
Yann LeCun (173 papers)
Stéphane Deny (12 papers)

Citations (2,102)

View on Semantic Scholar

Summary

The paper presents a novel self-supervised strategy that reduces redundancy by aligning the cross-correlation matrix to the identity, preventing trivial solutions.
It employs twin networks with diverse distortions and a ResNet-50 backbone to learn invariant yet decorrelated embeddings without relying on large batch sizes.
Experimental results demonstrate competitive performance, achieving 73.2% top-1 accuracy on ImageNet and robust transfer learning across multiple datasets.

Barlow Twins: Self-Supervised Learning via Redundancy Reduction

The paper "Barlow Twins: Self-Supervised Learning via Redundancy Reduction" introduces a novel method for self-supervised learning (SSL) designed to avoid trivial solutions and improve on both low-data and fully-supervised benchmarks. Unlike other methods, Barlow Twins employs an objective function centered on redundancy reduction, operationalizing a structural balance between invariance to data distortions and minimizing redundancy among embedding components.

Method Overview

The core principle behind Barlow Twins lies in avoiding trivial constant solutions by leveraging the cross-correlation matrix of embeddings from twin networks subjected to distorted inputs. The objective function pushes this matrix towards an identity matrix, thereby ensuring similar embeddings for different distortions of the same sample while reducing redundancy between vector components. The proposed loss function comprises two terms:

Invariance Term: Enforces similarity between embeddings of distorted versions.
Redundancy Reduction Term: Minimizes corruption by decorrelating component embeddings.

Implementation Details

Barlow Twins uses two identical networks consuming different distortions of the same image, with the resulting embeddings essentially treated as joint distributions. The loss function does not require large batch sizes or sophisticated mechanisms like momentum encoders or predictor networks for asymmetry.

Key Configurations:

Network Architecture: A ResNet-50 followed by a three-layer projector.
Optimization: Trains using the LARS optimizer over 1000 epochs, with modest computational resources compared to baseline methods.

Experimental Results

The performance of Barlow Twins is evaluated across several dimensions:

ImageNet Classification:

Linear evaluation yielded a top-1 accuracy of 73.2%, comparable with state-of-the-art methods.
Semi-supervised training, particularly in the 1% data regime, achieved 55.0% top-1 accuracy, surpassing most other methods.

Transfer Learning:

On various datasets (e.g., Places-205, VOC07, iNaturalist2018), Barlow Twins demonstrated competitive performance with high accuracy in linear classification and object detection/segmentation tasks.

Ablation Studies

Several ablations underscore the robustness and unique attributes of Barlow Twins:

Loss Function Integrity: Removing either the invariance term or the redundancy reduction term results in significant degradation, asserting their necessity.
Batch Size Robustness: Unlike other SSL methods that degrade with smaller batch sizes, Barlow Twins maintains robust performance down to a batch size of 256.
Projector Network Dimensions: Performance improves with higher-dimensional embeddings, which is uncharacteristic compared to other SSL techniques.

Theoretical Implications

The Barlow Twins method connects conceptually to the Information Bottleneck (IB) principle, positioning its loss function as an instantiation of IB specifically tailored for SSL. This perspective aligns with the dual goals of conserving sample information (invariance term) and eliminating redundancy (redundancy reduction term).

Future Directions

Given its design to efficiently handle small batch sizes and its potential scalability with high-dimensional embeddings, Barlow Twins opens avenues for exploring even larger embedding spaces and refining the balance between invariance to distortions and representation informativeness. Further iterations could examine alternative formulations of redundancy reduction or adapt the approach to more diverse data modalities and architectures.

In summation, the Barlow Twins paradigm presents a streamlined, effective approach to SSL by targeting redundancy reduction, setting the stage for future advances in embedding optimization and self-supervised methodologies.

Related Papers

GitHub

GitHub - facebookresearch/barlowtwins: PyTorch implementation of Barlow Twins. (983 stars)

Tweets

https://twitter.com/francoisfleuret/status/1772347647220650223

YouTube

Show All Videos