Homomorphic Self-Supervised Learning

Updated 1 July 2026

H-SSL is a self-supervised representation learning framework that unifies augmentation-based and augmentation-free methods through equivariant (homomorphic) encoders.
It reformulates contrastive objectives in feature space by employing group-structured operations and fiber sampling to mimic data augmentations without explicit transformations.
Empirical evaluations show that H-SSL achieves comparable performance to traditional methods using equivariant architectures, while non-equivariant models fail to capture its benefits.

Homomorphic Self-Supervised Learning (H-SSL) is a general framework for self-supervised representation learning that unifies augmentation-based and augmentation-free paradigms through the lens of equivariant (homomorphic) encoders. By design, H-SSL subsumes popular contrastive objectives and many traditional self-supervised losses when the feature extractor is augmentation-homomorphic with respect to a group of data transformations (Keller et al., 2022).

1. Formal Definition and Theoretical Foundation

Let $X$ denote the input space (e.g., images), and let $G$ be a discrete or continuous group of "augmentations" (e.g., translations, rotations, scalings). Consider a feature extractor $f: X \rightarrow \mathcal{H}$ and a group representation $\rho: G \rightarrow GL(\mathcal{H})$ . $f$ is augmentation-homomorphic (equivariant) if, for all $g \in G$ and $x \in X$ ,

$f(T_g[x]) = \rho(g) f(x)$

where $T_g[x]$ denotes the action of $g$ on $G$ 0 in input space, and $G$ 1 is its corresponding "lifted" action in feature space. Since $G$ 2 preserves the group structure, $G$ 3 is a homomorphism from $G$ 4 to $G$ 5.

This formulation enables H-SSL to operate directly in representation space using group-structure-respecting operations, rather than requiring explicit data augmentations.

2. Derivation of Self-Supervised Losses within H-SSL

Standard augmentation-based self-supervised learning (A-SSL)—such as SimCLR—uses contrastive objectives by sampling paired augmented versions of inputs. For a batch $G$ 6, with two independent augmentations $G$ 7 and projection head $G$ 8, the SimCLR loss is

$G$ 9

When $f: X \rightarrow \mathcal{H}$ 0 is $f: X \rightarrow \mathcal{H}$ 1-equivariant, this loss can be restated purely in feature-space by leveraging the homomorphic property: $f: X \rightarrow \mathcal{H}$ 2 Each input thus yields a single feature map $f: X \rightarrow \mathcal{H}$ 3, with "fibers" $f: X \rightarrow \mathcal{H}$ 4. H-SSL positives are constructed by sampling fibers $f: X \rightarrow \mathcal{H}$ 5 and $f: X \rightarrow \mathcal{H}$ 6 from $f: X \rightarrow \mathcal{H}$ 7, with $f: X \rightarrow \mathcal{H}$ 8 a base-set of prescribed size.

The H-SSL loss is then: $f: X \rightarrow \mathcal{H}$ 9 This form is algebraically identical to the A-SSL loss under equivariance, rendering A-SSL a special case of H-SSL.

Selection of $\rho: G \rightarrow GL(\mathcal{H})$ 0 and $\rho: G \rightarrow GL(\mathcal{H})$ 1 recovers various objectives; for instance, $\rho: G \rightarrow GL(\mathcal{H})$ 2 as the spatial-translation group with $\rho: G \rightarrow GL(\mathcal{H})$ 3 a single location reduces to the local DIM(L) or Greedy InfoMax objectives, and CPC emerges for appropriate choices of heads.

3. Equivalence Conditions and Failure Modes

Proposition 3.1 states that if $\rho: G \rightarrow GL(\mathcal{H})$ 4 is exactly $\rho: G \rightarrow GL(\mathcal{H})$ 5-equivariant, the A-SSL and H-SSL losses are identical under fiber sampling in feature-space. The proof consists of substituting $\rho: G \rightarrow GL(\mathcal{H})$ 6 and re-indexing over the group $\rho: G \rightarrow GL(\mathcal{H})$ 7.

Corollary 3.2 establishes a critical failure: if $\rho: G \rightarrow GL(\mathcal{H})$ 8 is not homomorphic—i.e., $\rho: G \rightarrow GL(\mathcal{H})$ 9 and $f$ 0 do not commute—then H-SSL cannot simulate the effect of input augmentations. Empirically, this results in representations that collapse or perform at chance level, in contrast to A-SSL, which still benefits from explicit input transformations.

4. Empirical Evaluation

Experimental validation employed three groups: rotation (four $f$ 1 steps), translation (±20% shifts), and scale (six downscaling factors). Equivariant backbones were used for each group (e.g., rotation-equivariant CNN, standard CNN, SESN).

Datasets: MNIST, CIFAR-10, Tiny ImageNet.
Analysis: Linear-probe accuracy after SSL pretraining was measured for both A-SSL (with explicit augmentation) and H-SSL (feature-space only).

Key findings are summarized in the following table:

Augmentation Group	CIFAR-10 (A-SSL)	CIFAR-10 (H-SSL)
Translation	39.2 ± 0.5%	36.3 ± 1.1%

This close correspondence confirms the theoretical equivalence of A-SSL and H-SSL when equivariant architectures are used.

When replacing equivariant layers with generic MLPs or non-equivariant CNNs, H-SSL performs at random or "frozen" levels, demonstrating the necessity of equivariance for feature-space contrastive pairs.

5. Relationship to Augmentation-Based SSL and Parameter Space

H-SSL introduces two key hyperparameters absent from vanilla A-SSL:

The base-set size $f$ 2, determining how many fibers are grouped as a "view."
The topographic distance between $f$ 3, $f$ 4 in $f$ 5, controlling the effective "augmentation strength."

Empirical variation shows that increasing $f$ 6 interpolates smoothly from local (DIM(L)) to global (SimCLR) losses, yielding only minor changes in downstream linear probe accuracy (within ±2% on CIFAR-10). Similarly, increasing the maximum allowable topographic distance $f$ 7 improves then degrades performance, paralleling findings in A-SSL regarding the utility of strong augmentations.

Temperature $f$ 8, embedding dimensionality, and projection head size remain as in SimCLR, confirming that H-SSL generalizes existing A-SSL setups.

6. Broader Implications and Directions

Homomorphic Self-Supervised Learning offers several conceptual and practical advantages:

Unified Perspective: Provides a principled bridge between augmentation-based and augmentation-free SSL, subsuming contrastive, alignment, uniformity, and local InfoMax variants as instantiations of a single group-equivariant InfoNCE objective.
Generalization: Admits any $f$ 9 for which an equivariant feature extractor is available, potentially allowing for multi-scale, multi-orientation, or learned, data-driven symmetries.
Novel Design Axes: Enables new forms of view sampling and hyperparameter tuning beyond batch size and temperature.

Current limitations include the construction of backbones that are equivariant to arbitrary data augmentations; most group convolutional networks are limited to $g \in G$ 0 or compact Lie groups. Approaches such as learned or approximate homomorphisms (e.g., topographic VAEs, NPTNs, L-convolutions) may be needed to extend H-SSL to the diverse augmentations employed in large-scale vision tasks.

Future research directions include hybrid models combining A-SSL (where equivariance is weak) with H-SSL (where group structure is strong), learning group representations $g \in G$ 1 jointly with $g \in G$ 2, and adapting H-SSL to Transformer architectures by constructing permutation- or patch-equivariant layers.

In summary, H-SSL reframes the diversity of contemporary self-supervised learning objectives as the outcome of constraining architectures to respect the underlying group structure of augmentations, with the InfoNCE loss serving as a universal objective across these contexts (Keller et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

Homomorphic Self-Supervised Learning (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Homomorphic Self-Supervised Learning (H-SSL).