Papers
Topics
Authors
Recent
Search
2000 character limit reached

Homomorphic Self-Supervised Learning

Updated 1 July 2026
  • H-SSL is a self-supervised representation learning framework that unifies augmentation-based and augmentation-free methods through equivariant (homomorphic) encoders.
  • It reformulates contrastive objectives in feature space by employing group-structured operations and fiber sampling to mimic data augmentations without explicit transformations.
  • Empirical evaluations show that H-SSL achieves comparable performance to traditional methods using equivariant architectures, while non-equivariant models fail to capture its benefits.

Homomorphic Self-Supervised Learning (H-SSL) is a general framework for self-supervised representation learning that unifies augmentation-based and augmentation-free paradigms through the lens of equivariant (homomorphic) encoders. By design, H-SSL subsumes popular contrastive objectives and many traditional self-supervised losses when the feature extractor is augmentation-homomorphic with respect to a group of data transformations (Keller et al., 2022).

1. Formal Definition and Theoretical Foundation

Let XX denote the input space (e.g., images), and let GG be a discrete or continuous group of "augmentations" (e.g., translations, rotations, scalings). Consider a feature extractor f:XHf: X \rightarrow \mathcal{H} and a group representation ρ:GGL(H)\rho: G \rightarrow GL(\mathcal{H}). ff is augmentation-homomorphic (equivariant) if, for all gGg \in G and xXx \in X,

f(Tg[x])=ρ(g)f(x)f(T_g[x]) = \rho(g) f(x)

where Tg[x]T_g[x] denotes the action of gg on GG0 in input space, and GG1 is its corresponding "lifted" action in feature space. Since GG2 preserves the group structure, GG3 is a homomorphism from GG4 to GG5.

This formulation enables H-SSL to operate directly in representation space using group-structure-respecting operations, rather than requiring explicit data augmentations.

2. Derivation of Self-Supervised Losses within H-SSL

Standard augmentation-based self-supervised learning (A-SSL)—such as SimCLR—uses contrastive objectives by sampling paired augmented versions of inputs. For a batch GG6, with two independent augmentations GG7 and projection head GG8, the SimCLR loss is

GG9

When f:XHf: X \rightarrow \mathcal{H}0 is f:XHf: X \rightarrow \mathcal{H}1-equivariant, this loss can be restated purely in feature-space by leveraging the homomorphic property: f:XHf: X \rightarrow \mathcal{H}2 Each input thus yields a single feature map f:XHf: X \rightarrow \mathcal{H}3, with "fibers" f:XHf: X \rightarrow \mathcal{H}4. H-SSL positives are constructed by sampling fibers f:XHf: X \rightarrow \mathcal{H}5 and f:XHf: X \rightarrow \mathcal{H}6 from f:XHf: X \rightarrow \mathcal{H}7, with f:XHf: X \rightarrow \mathcal{H}8 a base-set of prescribed size.

The H-SSL loss is then: f:XHf: X \rightarrow \mathcal{H}9 This form is algebraically identical to the A-SSL loss under equivariance, rendering A-SSL a special case of H-SSL.

Selection of ρ:GGL(H)\rho: G \rightarrow GL(\mathcal{H})0 and ρ:GGL(H)\rho: G \rightarrow GL(\mathcal{H})1 recovers various objectives; for instance, ρ:GGL(H)\rho: G \rightarrow GL(\mathcal{H})2 as the spatial-translation group with ρ:GGL(H)\rho: G \rightarrow GL(\mathcal{H})3 a single location reduces to the local DIM(L) or Greedy InfoMax objectives, and CPC emerges for appropriate choices of heads.

3. Equivalence Conditions and Failure Modes

Proposition 3.1 states that if ρ:GGL(H)\rho: G \rightarrow GL(\mathcal{H})4 is exactly ρ:GGL(H)\rho: G \rightarrow GL(\mathcal{H})5-equivariant, the A-SSL and H-SSL losses are identical under fiber sampling in feature-space. The proof consists of substituting ρ:GGL(H)\rho: G \rightarrow GL(\mathcal{H})6 and re-indexing over the group ρ:GGL(H)\rho: G \rightarrow GL(\mathcal{H})7.

Corollary 3.2 establishes a critical failure: if ρ:GGL(H)\rho: G \rightarrow GL(\mathcal{H})8 is not homomorphic—i.e., ρ:GGL(H)\rho: G \rightarrow GL(\mathcal{H})9 and ff0 do not commute—then H-SSL cannot simulate the effect of input augmentations. Empirically, this results in representations that collapse or perform at chance level, in contrast to A-SSL, which still benefits from explicit input transformations.

4. Empirical Evaluation

Experimental validation employed three groups: rotation (four ff1 steps), translation (±20% shifts), and scale (six downscaling factors). Equivariant backbones were used for each group (e.g., rotation-equivariant CNN, standard CNN, SESN).

  • Datasets: MNIST, CIFAR-10, Tiny ImageNet.
  • Analysis: Linear-probe accuracy after SSL pretraining was measured for both A-SSL (with explicit augmentation) and H-SSL (feature-space only).

Key findings are summarized in the following table:

Augmentation Group CIFAR-10 (A-SSL) CIFAR-10 (H-SSL)
Translation 39.2 ± 0.5% 36.3 ± 1.1%

This close correspondence confirms the theoretical equivalence of A-SSL and H-SSL when equivariant architectures are used.

When replacing equivariant layers with generic MLPs or non-equivariant CNNs, H-SSL performs at random or "frozen" levels, demonstrating the necessity of equivariance for feature-space contrastive pairs.

5. Relationship to Augmentation-Based SSL and Parameter Space

H-SSL introduces two key hyperparameters absent from vanilla A-SSL:

  • The base-set size ff2, determining how many fibers are grouped as a "view."
  • The topographic distance between ff3, ff4 in ff5, controlling the effective "augmentation strength."

Empirical variation shows that increasing ff6 interpolates smoothly from local (DIM(L)) to global (SimCLR) losses, yielding only minor changes in downstream linear probe accuracy (within ±2% on CIFAR-10). Similarly, increasing the maximum allowable topographic distance ff7 improves then degrades performance, paralleling findings in A-SSL regarding the utility of strong augmentations.

Temperature ff8, embedding dimensionality, and projection head size remain as in SimCLR, confirming that H-SSL generalizes existing A-SSL setups.

6. Broader Implications and Directions

Homomorphic Self-Supervised Learning offers several conceptual and practical advantages:

  • Unified Perspective: Provides a principled bridge between augmentation-based and augmentation-free SSL, subsuming contrastive, alignment, uniformity, and local InfoMax variants as instantiations of a single group-equivariant InfoNCE objective.
  • Generalization: Admits any ff9 for which an equivariant feature extractor is available, potentially allowing for multi-scale, multi-orientation, or learned, data-driven symmetries.
  • Novel Design Axes: Enables new forms of view sampling and hyperparameter tuning beyond batch size and temperature.

Current limitations include the construction of backbones that are equivariant to arbitrary data augmentations; most group convolutional networks are limited to gGg \in G0 or compact Lie groups. Approaches such as learned or approximate homomorphisms (e.g., topographic VAEs, NPTNs, L-convolutions) may be needed to extend H-SSL to the diverse augmentations employed in large-scale vision tasks.

Future research directions include hybrid models combining A-SSL (where equivariance is weak) with H-SSL (where group structure is strong), learning group representations gGg \in G1 jointly with gGg \in G2, and adapting H-SSL to Transformer architectures by constructing permutation- or patch-equivariant layers.

In summary, H-SSL reframes the diversity of contemporary self-supervised learning objectives as the outcome of constraining architectures to respect the underlying group structure of augmentations, with the InfoNCE loss serving as a universal objective across these contexts (Keller et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Homomorphic Self-Supervised Learning (H-SSL).