FAED: Fréchet Autoencoder Distance Explained
- FAED is a synthetic image quality metric that replaces InceptionV3 with a lightweight convolutional autoencoder, employing Monte Carlo dropout for uncertainty quantification.
- The approach computes multiple FAED scores by generating empirical Gaussians from stochastic CAE encodings and comparing them to a deterministic reference via squared Fréchet distance.
- Uncertainty measures—predictive variance and FAED standard deviation—serve to identify out-of-distribution samples and assess the reliability of synthetic image quality evaluations.
The Fréchet Autoencoder Distance (FAED) is a synthetic image quality metric structurally analogous to the well-known Fréchet Inception Distance (FID), but replaces the InceptionV3 feature model with a lightweight convolutional autoencoder (CAE). FAED further introduces uncertainty quantification (UQ) by employing Monte Carlo dropout throughout the encoder, producing not only a distribution of metric values but also explicit, interpretable uncertainty measures—namely the predictive variance of latent embeddings (“pVar”) and the standard deviation of the FAED scores (). These uncertainty scores offer heuristic indicators of the degree to which the samples under evaluation are out-of-distribution relative to the embedding model’s training domain, directly informing the trustworthiness of FAED as a synthetic image quality assessment tool (Bench et al., 4 Apr 2025).
1. Formal Definition and Methodology
Given a “test” image set and a “reference” set , both consisting of normalized RGB images, the process employs a trained CAE with encoder producing -dimensional latent codes . Monte Carlo dropout is activated in every encoder layer with a 10% rate during both training and (crucially) evaluation, modeling epistemic uncertainty. For each test image, stochastic forward passes yield .
For each Monte Carlo sample , embeddings across the test set define an empirical Gaussian , where:
The reference statistics are computed deterministically from . The th FAED score is the squared Fréchet distance between the two Gaussians:
The procedure results in FAED scores per evaluation.
2. CAE Architecture and Training Protocol
The CAE adopts a standard design:
- Encoder : three convolution layers (stride 2; channel progression 3128256512), each followed by ReLU and 10% dropout, then flattened and mapped to a 256-dimensional latent by a linear layer.
- Decoder : linear up-projection to , three transposed convolutions (mirroring encoder channels in reverse) with ReLU, reconstructing to three output channels.
- Loss: pixelwise MSE.
- Training data: ImageWoof (9,035 train / 3,929 val), 25 epochs, batch size 16, Adam optimizer.
Dropout is active throughout the encoder during all stages, ensuring the CAE’s latent space exposes epistemic uncertainty in downstream metrics.
3. Uncertainty Quantification in FAED
Monte Carlo dropout, maintained at inference time, produces a distribution of CAE encodings for each input. Uncertainty quantification is achieved via two principal measures:
- Predictive variance (pVar): captures average variance across latent dimensions and input images:
This reflects epistemic uncertainty inherent in the feature mapping.
- Standard deviation of FAED (): quantifies how embedding variance amplifies through the Fréchet distance computation:
The magnitude of both metrics empirically correlates with the degree of input domain shift. High values flag increasing unreliability of the FAED itself. These measures are to be reported alongside the mean FAED for comprehensive interpretability (Bench et al., 4 Apr 2025).
4. Reference Implementation and Pseudocode
The computation proceeds as follows:
- Deterministically compute from (without dropout).
- For :
- Draw a stochastic encoding for every in using dropout.
- Aggregate to , .
- Compute via the Fréchet distance between the empirical and reference Gaussians.
- Return .
1 2 3 4 5 6 7 8 9 10 11 12 |
for i in range(N): z_i = E(y_i) # deterministic μ_y = mean(z_1, ..., z_N) Σ_y = cov(z_1, ..., z_N) for j in range(M): for i in range(N): l_{i,j} = E(x̂_i) # with dropout μ_{x̂}^{(j)} = mean(l_{1,j}, ..., l_{N,j}) Σ_{x̂}^{(j)} = cov(l_{1,j}, ..., l_{N,j}) FAED^{(j)} = ||μ_{x̂}^{(j)} - μ_y||^2 + Tr(Σ_{x̂}^{(j)} + Σ_y - 2*(Σ_{x̂}^{(j)}Σ_y)^{1/2}) |
5. Empirical Validation and Domain Sensitivity
FAED’s sensitivity to domain shift and OOD effects is established via controlled perturbations of the evaluation data. Using the ImageWoof dataset for CAE training, five test sets were constructed, ordered by increasing “domain gap”:
| Test set | Mean FAED | σ_FAED | pVar |
|---|---|---|---|
| ImageWoof (baseline) | 25.923 | 0.019 | 0.0051 |
| ImageWoof + 2% Gaussian noise | 29.183 | 0.021 | 0.0042 |
| ImageWoof + 5 self-overlays | 50.276 | 0.030 | 0.0062 |
| ImageWoof + 5 random Imagenette overlays | 62.225 | 0.042 | 0.0070 |
| Imagenette (no dog) | 143.932 | 0.085 | 0.0082 |
The monotonic increase of mean FAED, , and pVar with domain shift validates the metric’s effectiveness as a quality indicator and the utility of its uncertainty scores for evaluating trustworthiness. The nearly linear dependency on perturbation severity (as in Figure 1) confirms fine-grained sensitivity (Bench et al., 4 Apr 2025).
6. Interpretation and Practical Guidance
Practical use of FAED mandates co-reporting of mean, pVar, and . Key insights:
- Diagnostic role of uncertainty metrics: Large pVar or indicates that latent encodings are sampled from regions not well supported by the CAE's training data, flagging possible unreliability of the FAED. pVar isolates epistemic uncertainty at the embedding level; includes compounded uncertainty effects in the FAED formula.
- Domain specificity: For evaluation outside natural image domains (e.g., medical imaging), the CAE must be trained on in-domain data. Only trust FAED where uncertainties remain low by empirical thresholds.
- Dropout tuning: The dropout rate impacts uncertainty calibration and must be set for the task, or augmented with additional UQ techniques as needed.
- Summary: FAED extends FID, combining metric conciseness with explicit UQ, enabling robust application in settings where the reliability of synthetic image quality judgment is essential.
7. Relation to FID and Broader Implications
FAED redirects the dependency from an ImageNet-pretrained classifier (InceptionV3) to an unsupervised, domain-adaptive CAE. This replacement is particularly significant for fields where pretrained discriminative models may lack epistemic coverage—most notably in scientific or medical image synthesis. Uncertainty quantification via Monte Carlo dropout provides explicit trustworthiness signals, which are critical for high-stakes applications. A plausible implication is that similar UQ procedures could be integrated with other synthetic data metrics by analogously equipping embedding models with dropout or Bayesian techniques, although this is not explicit in the current results (Bench et al., 4 Apr 2025).