Papers
Topics
Authors
Recent
Search
2000 character limit reached

Skew Inception Distance (SID) Explained

Updated 27 January 2026
  • Skew Inception Distance (SID) is a metric that extends FID by incorporating third-moment (skewness) information to capture non-Gaussian features in image synthesis.
  • It computes discrepancies between real and generated distributions using means, covariances, and coskewness tensors, optimized via PCA for efficient calculation.
  • SID aligns more closely with human perception by detecting perceptually meaningful distortions, making it a robust tool for assessing generative models.

Skew Inception Distance (SID) is a statistical metric designed to evaluate the quality of feature distributions produced by generative models, notably Generative Adversarial Networks (GANs). SID explicitly incorporates third-moment (skewness) information in feature space, thereby extending the well-established Fréchet Inception Distance (FID)—which considers only first and second moments. Originating in the context of image synthesis, SID is motivated by the observation that FID’s Gaussian assumption misses non-Gaussian structure present in real-world data. SID is rigorously defined, admits an efficient practical implementation via dimensionality reduction, and exhibits empirical properties distinct from FID, sometimes aligning more closely with human perceptual judgments (Luzi et al., 2023).

1. Mathematical Definition

Let X1,,XnRdX_1, \ldots, X_n \in \mathbb{R}^d and Y1,,YmRdY_1, \ldots, Y_m \in \mathbb{R}^d be feature vectors extracted—typically from the penultimate layer of an Inception-v3 network—from real and generated images, respectively. SID compares the empirical distributions of these sets through their first three moments:

  • Means: μp=(1/n)i=1nXi\mu_p = (1/n)\sum_{i=1}^n X_i, μq=(1/m)j=1mYj\mu_q = (1/m)\sum_{j=1}^m Y_j
  • Covariances: Σp=(1/(n1))i=1n(Xiμp)(Xiμp)\Sigma_p = (1/(n-1))\sum_{i=1}^n (X_i - \mu_p)(X_i - \mu_p)^\top, similarly for Σq\Sigma_q
  • Coskewness tensors: spRd×d×ds_p \in \mathbb{R}^{d\times d\times d}, entries (sp)ijk=(1/n)=1nX,iX,jX,k(s_p)_{ijk} = (1/n)\sum_{\ell=1}^n X^*_{\ell,i} X^*_{\ell,j} X^*_{\ell,k}, where X=Σp1/2(Xμp)X^* = \Sigma_p^{-1/2}(X - \mu_p) and analogously for sqs_q

The full Skew Inception Distance is then:

SID(P,Q)=μpμq22+Tr(Σp+Σq2(ΣpΣq)1/2)+α(sp)α(sq)F2\mathrm{SID}(P, Q) = \sqrt{ \| \mu_p - \mu_q \|_2^2 + \mathrm{Tr}(\Sigma_p + \Sigma_q - 2(\Sigma_p \Sigma_q)^{1/2}) + \| \alpha(s_p) - \alpha(s_q) \|_F^2 }

where α(x)=x1/3\alpha(x) = x^{1/3} is applied elementwise to normalize units ("cube-root normalization") (Luzi et al., 2023). For the third term, the Frobenius norm is used.

SID is thus FID augmented by a non-Gaussian skewness component, allowing it to detect discrepancies in higher-order moments between real and generated distributions.

2. Metricity and Pseudometric Properties

SID defines a metric on the space of distributions determined by their first three moments. If the mapping P(μp,Σp,sp)P \mapsto (\mu_p, \Sigma_p, s_p) is injective (moments characterize the distribution), SID is a true metric; otherwise, it is a pseudometric—i.e., SID(P,Q)=0(P, Q) = 0 is possible for distinct PP and QQ sharing the first three moments (Luzi et al., 2023). This distinction ensures SID’s validity as a quantitative tool but highlights that equality of SID does not guarantee full distributional equality unless higher moments match or are irrelevant in the application domain.

3. Practical Computation and PCA Acceleration

Direct computation of the coskewness term in high dimensions is prohibitive—O(d3)O(d^3) time and memory. For typical feature space with d=2048d=2048, this requires up to 64 GB RAM for a single tensor. To address this, dimensionality reduction via principal component analysis (PCA) is performed:

  • Fit PCA on real feature vectors XX, obtaining top kdk \ll d principal axes.
  • Project XX and YY onto these axes, reducing both to kk dimensions.
  • Optionally, rescale covariance to match the full-trace energy.
  • Compute mean, covariance, and coskewness in kk-dimensional space.
  • Assemble SID using the projected moments (Luzi et al., 2023).

This allows SID, including the skewness term, to be computed in seconds on standard hardware when k256k\leq 256. PCA also leaves FID’s trace term tractable. Robustness checks indicate that the skewness deviations persist after PCA, even down to k=16k=16 (Luzi et al., 2023).

4. Behavior Under Distortions and Empirical Performance

SID has been empirically analyzed on Inception-v3 features of ImageNet data, as well as other datasets and architectures. Key findings include:

  • FID increases linearly with added Gaussian noise, even where perturbations are imperceptible to humans.
  • The skewness component of SID remains near zero for small, imperceptible noise levels, increasing only when distortions become visible (Luzi et al., 2023).
  • This suggests SID’s sensitivity to perceptually meaningful—but not purely statistical—differences, sometimes aligning more closely with observer judgments than FID.
  • SID’s empirical stability is demonstrated across different feature extractors and is robust to PCA-based dimensionality reduction.

5. Applications Beyond GAN Evaluation

Although motivated by the limitations of Gaussian-based metrics in GAN assessment, SID is generally applicable to any learning scenario where feature distributions are compared:

  • Evaluation of other generative models (e.g., diffusion models)
  • Out-of-distribution detection
  • Few-shot learning, where feature normality is often (implicitly or explicitly) assumed

Extensions include replacing PCA with random projections for further computational gains and substituting alternative skewness metrics (e.g., Mardia’s, Kollo’s) provided an appropriate embedding into a metric space is possible (Luzi et al., 2023).

6. Computational Complexity, Limitations, and Extensions

The main computational hurdle in SID is the coskewness calculation. Without dimensionality reduction, memory and computation are prohibitive. With PCA to k=256k=256, the entire process requires approximately 4.7 seconds on CPU and 0.02 seconds on GPU, using roughly 128 MB RAM (Luzi et al., 2023). Limitations include:

  • SID inherits FID’s bias when sample sizes are small (n,m<50000n, m < 50\,000).
  • SID may disregard non-moment-based differences; distinct distributions sharing first three moments yield SID=0=0.
  • The cube-root normalization and optional scaling introduce tunable hyperparameters.
  • Projecting via PCA may discard information in lower-variance modes.

Potential extensions include using alternative third-moment distances and extending SID to new use-cases or domains with non-Gaussian feature distributions (Luzi et al., 2023).

7. Relationship to Other Moment-Based Metrics

SID generalizes FID by addressing its main limitation: the latter’s reduction of all discrepancy to mean and covariance (two moments), justified only if features are Gaussian. Empirically, many real-data features are strongly non-Gaussian, as evidenced by statistical tests for skewness post-PCA (Luzi et al., 2023). By incorporating skewness, SID provides a more discriminating and nuanced measure for matching real and generated distributions, especially when evaluating visual fidelity and diversity under practical and perceptually relevant distortions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Skew Inception Distance (SID).