Fog-Invariant Feature Learning (FIFO)
- Fog-Invariant Feature Learning (FIFO) is a framework that disentangles fog, style, and content, enabling robust performance in adverse weather and geographic transferability.
- FIFO employs factorized architectures and cumulative loss functions that ensure additive separability between style and fog effects, leading to enhanced generalization across domains.
- Extensive evaluations using semantic segmentation on Foggy Zurich and physics-informed fog forecasting on airports validate FIFO’s effectiveness, achieving significant performance gains and high AUC scores.
Fog-Invariant Feature Learning (FIFO) refers to a class of methods and feature representations designed to disentangle, suppress, or engineer features such that downstream tasks are insensitive to the presence, density, or specific regime of fog. This property is sought in both computer vision (semantic scene understanding under adverse weather) and geoscientific modeling (predicting fog formation or dissipation in a physically transferable way). FIFO systems achieve this invariance either by explicit architectural and loss-based disentanglement (for image-based perception) or by strict physics-informed, coordinate-free feature engineering (for geographic and regime-invariant prediction), yielding strong generalization across domains, weather, and location.
1. Factorized Representation and Architecture for Visual FIFO
Image-based FIFO, as instantiated in the CuDA-Net framework for semantic scene understanding under fog, decomposes visual input into three latent codes:
- : content code encoding the invariant semantic structure,
- : style code capturing domain- or city-dependent appearance,
- : fog code quantifying visibility and fog density.
Additionally, a dual encoder models residual factors spanning clearfog transitions when style and fog are not perfectly separable. A single shared decoder reconstructs or translates images from these codes. Semantic segmentation is performed by a dedicated head acting only on , ensuring the prediction is fog-invariant after training, since is forced to be free of fog and style effects.
For cumulative domain adaptation, three networks are trained in sequence and cascade—source0intermediate, intermediate1target, source2target—with 3 and 4 shared, but separate 5 per leg. This cascade enforces sequential factorization of style, fog, and their interaction (Ma et al., 2021).
2. Loss Functions and Cumulative Disentanglement
Formal training objective for each network encompasses: (a) Within-domain reconstruction using VGG-based perceptual pixel loss:
6
(b) Cross-domain translation enforcing content preservation post translation:
7
(c) Segmentation supervision combining cross-entropy on labeled source and adversarial entropy-minimization over target:
8
(d) Total per-stage loss:
9
(e) Cumulative relationship loss enforces the additivity of style and fog factors:
0
with 1 (style shift), 2 (fog shift), 3 (dual shift), and 4 the codes for each leg.
Cyclical training alternates which encoder is updated, freezing the other two, and includes 5 to tie the factorizations together. The effect is a deeply disentangled 6 that is empirically fog-invariant (Ma et al., 2021).
3. Evaluation: Quantitative and Ablation Results
On Foggy Zurich-test (mean intersection over union, mIoU, over 19 classes), CuDA-Net and FIFO show the following progression:
| Approach | mIoU (%) |
|---|---|
| Baseline Deeplab-v2 (source only) | 25.9 |
| Direct s→t (one-shot disentangle) | 40.2 |
| s→m only (style) | 39.2 |
| m→t only (fog) | 42.5 |
| s→m + m→t (two-step) | 43.1 |
| + F_{s→t} dual gap | 43.1 |
| + cyclical T=2 | 45.8 |
| + cumulative loss (FIFO full model) | 48.2 |
Addition of 498 synthetic Dense Foggy Cityscapes (CuDA-Net+) boosts mIoU to 49.1% on Foggy Zurich and 53.5% on Foggy Driving. Each component (style adaptation, fog adaptation, dual-gap, cyclical, cumulative loss) contributes significant gains (up to +22.3 points versus baseline). Removal of any specialized encoder degrades performance by 10–15 points, supporting the necessity of explicit factorization (Ma et al., 2021).
4. Rationale for FIFO over Defogging and Domain-Randomization
Traditional defogging pipelines (e.g., MSCNN, DCP, GFN) introduce visual artifacts and fail to guarantee alignment in the downstream feature space. Synthetic-to-real weather domain randomization obscures the decomposition between fog and other appearance factors, resulting in residual domain gaps due to inadequately separated influences.
FIFO leverages empirical evidence—MVV studies show that style and fog gradients are additive and independently addressable—to structure the adaptation pathway. The cumulative loss 7 enforces that the sum of style and fog gaps equates the direct domain shift, enabling tighter, more robust feature alignment than monolithic adversarial or self-training schemes (Ma et al., 2021).
5. Physics-Informed FIFO for Geographic Transferability
Beyond image domains, the FIFO principle applies to fog prediction across real-world sites through coordinate-free, physics-driven features, as demonstrated in FOG-Net for airport fog forecasting (Castillo, 21 Oct 2025). Feature engineering enforces universal fog-process representation:
- Persistence: lagged visibility at multiple horizons, capturing autocorrelation,
- Atmospheric State: 8, dew-point depression 9, relative humidity,
- Dynamics: 0m wind speed 1, surface pressure 2,
- Vertical Structure: thermal inversion 3, low cloud cover,
- Trends: time derivatives of DPD, temperature, pressure,
- Cyclical Drivers: solar angle 4, day-of-year, is_night.
Exclusion of latitude/longitude and explicit use of only physically meaningful variables enforce transferability across hemispheres and fog regimes.
A gradient boosting XGBoost model (5, depth 5, 6 for imbalance) achieves AUC of 0.9695 on-holdout (SCEL, Chile), and strictly zero-shot AUCs of 0.9230 (SCTE), 0.9471 (KSFO), 0.9338 (EGLL)—demonstrating ~3–5 % degradation across 711,650 km and spanning radiative, advective, and hybrid fog mechanisms.
SHAP analysis across sites reveals visibility persistence, solar elevation, and seasonal cycle as dominant factors, with secondary features (pressure trend, thermal inversion strength) adapting in importance according to local fog physics (Castillo, 21 Oct 2025).
6. Implementation and Robustness Considerations
CuDA-Net FIFO for images utilizes:
- Backbone: Deeplab-v2 (ResNet-101 + ASPP) for 8,
- Dedicated 9, 0, 1: 3 conv blocks, 256-D latent output,
- 2: 4 up-sampling convs plus final layer,
- Adversarial segmentation discriminator: 4 conv layers,
- Optimization: Adam (3), lr 4 decaying to 5,
- Cycle rounds 6, batch size 1 per domain-pair,
- Training sets: Cityscapes clear (498), Clear Zurich (248), Foggy Zurich (1498 medium fog), optionally with 498 Dense Foggy Cityscapes augmented.
Ablation studies confirm performance robustness to hyperparameters (7 optimal near 0.25), and that 8 in 9 outperforms 0 or cosine. Each private encoder is necessary for reachability of full performance.
For the XGBoost-based airport fog forecasting FIFO, class imbalance is handled by 1, and strict temporal hold-out guarantees no information leakage. The exclusion of geographic identifiers is central for transferability (Castillo, 21 Oct 2025).
7. Limitations and Future Directions
FIFO for computer vision is currently demonstrated on three European urban road datasets and their synthetic fog augmentation, with strong results under various fog densities and extension potential to other adverse conditions (rain, snow).
Geoscientific FIFO, while robust across hemispheres and marine/radiative fog regimes, has been validated on four airports; extension to tropical, polar, or highly complex microclimates remains. Spatial resolution of ERA5 limits microphysical granularity. XGBoost, while interpretable, does not define predictive uncertainty; proposed future avenues include probabilistic models and site-specific calibration that preserves feature universalism.
Further research may consider:
- Scaling to more domains and finer spatial granularity,
- High-resolution physical variable integration,
- Quantifying forecast uncertainty via Bayesian ensembles,
- Systematic automation of threshold calibration for deployment.
These directions aim to consolidate fog-invariant feature learning as a best practice in both scene understanding and physical-process prediction (Ma et al., 2021, Castillo, 21 Oct 2025).