SerpentFlow: Shared-Structure Domain Adaptation
- SerpentFlow is a framework that decomposes samples into shared low-frequency and domain-specific high-frequency components, enabling robust unpaired domain adaptation.
- It leverages a classifier-based frequency cutoff selection to isolate invariant features and construct synthetic pseudo-pairs for conditional generative modeling.
- Validation on super-resolution, fluid simulation, and climate downscaling tasks demonstrates superior reconstruction accuracy and domain alignment.
SerpentFlow (SharEd-structuRe decomPosition for gEnerative domaiN adapTation) is a generative framework introduced for unpaired domain alignment, specifically designed to operate in scenarios where two distinct domains exhibit shared underlying structure but do not offer paired observations. The central innovation lies in decomposing each sample into shared and domain-specific components within a latent space—typically the Fourier domain for tasks such as super-resolution—enabling the construction of synthetic "pseudo-pairs" for conditional generative modeling. Through a data-driven, classifier-based criterion for isolating invariant features common to both domains, SerpentFlow addresses the challenge of unsupervised cross-domain mapping by leveraging shared structural patterns for robust alignment and reconstruction (Keisler et al., 5 Jan 2026).
1. Mathematical Formulation
SerpentFlow models the source domain and the target domain as unpaired datasets with respective distributions and over a shared observation space . The approach relies on a bijective encoder that provides an additive decomposition in the latent domain:
where corresponds to the shared subspace (invariant structure) and encapsulates domain-specific information. In super-resolution tasks, is instantiated as the Fourier domain. The frequency cutoff partitions the latent subspace:
- (low frequencies)
- (high frequencies)
The sample-level decomposition is
Selection of is based on a discriminator trained to distinguish the domains using low-pass filtered samples . The cutoff is chosen as the minimum value for which the discriminator's classification accuracy drops to chance (), indicating maximal removal of domain-specific information in the shared space.
2. Generative Model and Training Procedure
With fixed via the classifier-based criterion, SerpentFlow constructs pseudo-pairs for generative modeling:
- For a target-domain sample , synthesize
with . This preserves the shared low frequencies and replaces high-frequency content with noise.
- The pairs are used for conditional generative training. The conditional distribution is modeled via continuous-time flow matching:
with linear interpolation:
- The flow-matching loss [Lipman et al. '23] is:
- Architecture: is parameterized as a U-Net, conditional on via FiLM or concatenation.
The total loss consists of the discriminator loss (for cutoff selection) and the generative flow-matching loss (for high-resolution sample reconstruction). A standard reconstruction loss can also be included in principle.
3. End-to-End Pipeline
The SerpentFlow pipeline proceeds in sequential phases:
| Phase | Operation | Purpose |
|---|---|---|
| A | Classifier-based cutoff selection | Identify for shared-structure decomposition |
| B | Pseudo-pair construction in target domain | Enable conditional modeling with synthetic pairs |
| C | Train conditional flow-matching model | Learn |
| D | Inference: map source-domain input to target domain | Generate aligned target-domain sample from source input |
After training, inference on involves extracting the low-pass component using , sampling random high-frequency noise , synthesizing the starting point , and integrating the learned flow ODE to obtain the corresponding target-domain sample.
4. Experimental Validation and Results
SerpentFlow is validated on three unpaired super-resolution and downscaling tasks:
- MRBI Synthetic Images (28×28 handwritten-digit backgrounds):
- : low-pass filtered (cutoff ), : full MRBI.
- Metrics: digit-recognition accuracy (fine-tuned ResNet-18), domain-classification accuracy (new discriminator).
- Compared to Dual FM and Diffusion Bridge baselines:
- Digit accuracy: (SerpentFlow) vs. (best baseline)
- Domain-classification: (chance) for SerpentFlow, for baselines
- Qualitative: preservation of digit shapes and plausible fine-grained backgrounds.
- Fluid-Simulation Super-Resolution (64512 grid):
- : coarse 64×64 fields, : high-resolution 512×512 fields (wavenumbers ).
- Metrics: temporal-mean trajectories, probability densities, azimuthally-averaged power spectral densities.
- For , SerpentFlow best matches true temporal dynamics and spectral properties, avoiding artifacts present in alternatives.
- Climate Downscaling (CMIP6 GCMERA5 winds over France):
- Downscaling from to , daily fields.
- Metrics:
- Kolmogorov–Smirnov (KS) statistic: 0.0253 (SerpentFlow) vs. 0.0249 (Dual FM) vs. 0.251 (Diffusion Bridge)
- Correlation-score bias: 0.052 (SerpentFlow), 0.045 (Dual FM), 0.293 (Bridge)
- Temporal RMSE: 0.031 (SerpentFlow), 1.226 (Dual FM)
- Nash–Sutcliffe efficiency (NSE): 0.982 (SerpentFlow), 0.291 (Dual FM)
- Qualitative: recovery of terrain-induced correlation patterns and preservation of interannual variability.
5. Theoretical Insights and Extensions
SerpentFlow’s shared-structure decomposition transforms the unpaired alignment problem into a conditionally paired setting by isolating invariant signal content (large-scale or low-frequency structure). This framework enables conditional generative models to focus learning capacity on domain-specific variability while maintaining global consistency across domains.
The classifier-based cutoff selection is inherently data-driven and adapts flexibly to any domain pair with shared latent structure in frequency or other transform domains. Although instantiated here in the Fourier domain, the approach admits extensions to multi-scale or wavelet decompositions, as well as learned encoder representations for the latent domain.
SerpentFlow is agnostic to the specific conditional generative model: the flow-matching objective can be replaced by other frameworks, such as conditional GANs, conditional diffusion models, or normalizing flows, so long as the construction of pseudo-pairs via the shared/component decomposition is preserved.
Potential future directions include adapting the methodology to temporal upsampling, video frame interpolation, and more general structured signal reconstruction tasks. A plausible implication is that shared-structure decomposition may serve as a unifying principle for generative unpaired domain adaptation across a range of modalities (Keisler et al., 5 Jan 2026).