Dynamical Chern-Simons Gravity
- Dynamical Chern-Simons gravity is a modified gravity theory that couples a scalar field to the Pontryagin density, introducing controlled parity violation.
- The theory extends General Relativity, predicting unique signatures in rotating black holes and gravitational wave propagation that can be tested observationally.
- It provides a framework for probing deviations from standard GR and offers insights into quantum gravity and the interplay of fundamental forces.
In “The Deepfake Detective” we show that a deepfake‐detector’s internal representations at layer i can be viewed as a collection of low‐dimensional manifolds—each manifold corresponding to varying levels of a single forensic artifact. By analyzing the geometry of these manifolds (their intrinsic dimension, curvature, and how sharply individual features respond) we gain a clear, quantitative picture of “what the network looks for” in order to flag an image as fake. Below we sketch the full framework in an integrated, end‐to‐end manner.
- Definition of the Forensic Feature Manifold Let I be an input image (real or fake), let a index one of our canonical artifact types (geometric warp, lighting inconsistency, boundary blur, color mismatch), and let p∈[0,1] be a continuous “severity” parameter for that artifact (p=0 means no artifact, p=1 means maximal distortion). Denote by Aₐ(I,p) the augmented image formed by applying artifact a at level p to image I. If ϕᵢ: Image→ℝ{Dᵢ} extracts the activation vector at layer i of the pre‐trained deepfake detector, then for fixed (a,I) we define the forensic feature manifold
M_{i,a,I} = { xᵢ(p) ≡ ϕᵢ( Aₐ(I,p) ) : p∈[0,1] } ⊂ ℝ{Dᵢ}.
In practice we discretize p into T levels p₀,…,p_{T−1}, collect xᵢ(p_t) for t=0…T−1, and aggregate either per‐image or over a small set of N images to study the geometry of this curve (trajectory) through ℝ{Dᵢ}.
- Estimating Intrinsic Dimensionality We treat the T points { xᵢ(p_t) } as samples from a T‐point trajectory in ℝ{Dᵢ} and compute their covariance matrix Σᵢ = Cov( { xᵢ(p_t) }{t=0…T−1} ). Let λ₁≥λ₂≥…≥λ{Dᵢ} be the eigenvalues of Σᵢ. Two common measures of intrinsic dimension (ID) are:
a) PCA‐energy threshold d_{int}(τ) = min {k : (∑{j=1}k λ_j)/(∑{j=1}{Dᵢ} λ_j) ≥ τ} with τ=0.95 typically. (Equation 1)
b) Participation Ratio PR = (∑{j=1}{Dᵢ} λ_j)² / (∑{j=1}{Dᵢ} λ_j²).
In our experiments we report d_{int} as in (1). A low d_{int} (≈1–2) means the artifact moves features along essentially one axis; a higher d_{int} (3–5) means a more complex, multi‐directional response.
- Computing Manifold Curvature
Because the path p↦xᵢ(p) is one‐dimensional in parameter space but generally curved in ℝ{Dᵢ}, we define a discrete curvature measure that captures its nonlinearity. Let
Δ² xᵢ(p_t) = xᵢ(p_{t+2}) − 2 xᵢ(p_{t+1}) + xᵢ(p_t), t=0…T−3.
Then the average curvature Cᵢ is
Cᵢ = (1/(T−2)) ∑_{t=0}{T−3} ‖ Δ² xᵢ(p_t) ‖₂. (Equation 2)
If Cᵢ≈0 the trajectory is nearly affine (straight‐line); larger Cᵢ indicates a more bent manifold and hence a nonlinear encoding of the artifact.
- Feature Selectivity
We also ask: how many individual units within layer i respond in a monotonic way to p? Denote by xᵢ{(j)}(p_t) the jᵗʰ coordinate of xᵢ(p_t). Compute the Pearson correlation
ρj = corr( { xᵢ{(j)}(p_t) }{t}, { p_t }_{t} ), for j=1…Dᵢ.
Then define the selectivity score Sᵢ for layer i and artifact a as
Sᵢ = (1/Dᵢ) ∑_{j=1}{Dᵢ} |ρ_j|. (Equation 3)
Sᵢ∈[0,1]; high Sᵢ means many neurons track the artifact strength. We also examine the distribution of |ρ_j| across j to find which coordinates are most selective.
- The Sparse Autoencoder (SAE) Architecture
To compress the raw Dᵢ‐dimensional activation vectors xᵢ into a small, interpretable set of latent features, we train for each layer i an undercomplete SAE with encoder fᵢ:ℝ{Dᵢ}→ℝ{dᵢ} and decoder gᵢ:ℝ{dᵢ}→ℝ{Dᵢ}, where dᵢ=Dᵢ/8 (capped at 16 384). The training loss for a minibatch {x} is
L_{SAE} = ‖ x − gᵢ(fᵢ(x)) ‖₂² + λ ‖ fᵢ(x)‖₁, (Equation 4)
with λ=10⁻³. Both fᵢ and gᵢ are single linear layers (plus optional nonlinearity on the decoder). We train with Adam (lr=10⁻⁴), early‐stop after 3 val‐epochs without improvement, and observe final latent sparsity 85–95% (i.e. per‐sample only 5–15% of the dᵢ units are nonzero). After training we analyze which latent dimensions are “active” (ever above a small threshold), their activation frequency across images, and their selectivity to forensic artifacts as per (3).
- Controlled Artifact Manipulation Protocol
a) Select N real/fake images (we use N=10 per artifact).
b) For artifact a, generate T=8 levels p_t= t/(T−1), t=0…7.
c) For each image I and each p_t compute I_{t}=Aₐ(I,p_t).
d) Extract xᵢ(p_t)=ϕᵢ(I_t) for each target layer i.
e) Aggregate over the N images (e.g. average at each level) if desired.
f) On { xᵢ(p_t) }_{t=0…7} compute:
- Intrinsic dimension d_{int} by (1)
- Curvature Cᵢ by (2)
- Selectivity Sᵢ by (3) g) Repeat for each artifact type and layer.
- Key Empirical Trends We summarize the main geometry metrics (all averaged over the four artifact types) across our five probe layers L₁…L₅:
- Intrinsic Dimensionality d_{int}:
- L₁ (early): 1–2
- L₃/L₄ (mid): 3–4
- L₅ (penultimate): collapses back toward 1
- Curvature Cᵢ:
- L₁: low (artifacts cause near‐linear feature change)
- L₃/L₄: moderate (C≈15–25), showing nonlinear regimes as artifacts intensify
- L₅: small again—final layer “flattens” manifold into logit
- Selectivity Sᵢ:
- L₁: ≈0.1–0.2 (generic edge detectors do not track artifacts)
- L₃/L₄: up to ≈0.5 for warp & blur, ≈0.3 for lighting, ≈0.2 for color
- L₅: one or two features with |ρ_j|∼0.8–0.9, others near zero (information concentrated)
Figure 1 in the paper shows the ensemble statistics: mean d_{int}≈3.75, mean C≈19.15, mean S≈0.495 across artifacts in mid‐layers. Moreover, the SAE‐latent‐space selectivity plot (Figure 2) underscores that only a tiny fraction (<20%) of latent axes have |ρ|>0.2, and only a handful achieve |ρ|>0.5.
- Interpreting Geometry → Mechanistic Insights
- Mid‐level manifold axes: The fact that for geometric warp the manifold in L₄ has d_{int}≈4 and curvature C>0 implies the detector encodes warp in multiple nonlinear “modes” (e.g. low‐level shape distortions, texture displacement, shadow misalignment, feature‐pair mismatches).
- Sharp selectivity peaks: SAE dimensions with ρ_j>0.7 can be visualized (by projecting images that maximize them) and consistently correspond to familiar forensics cues (e.g. a “blur detector” that places negative weights on high-frequency filters, a “lighting offset” that responds to shadow edges).
- Layer‐wise pipeline: Early layers register that some degradation is happening but do not differentiate artifact type; mid‐layers disentangle distinct artifact factors in multi‐dimensional curved manifolds; final layers collapse these into a binary real/fake decision, concentrating each artifact signal into one or two logit‐like features.
Guidance for Building Interpretable Detectors * Architectures can explicitly allocate small “bottleneck” sub‐modules per artifact, trained to align with these manifold axes. * Data‐augmentation regimes should sweep artifact parameters to encourage clean, low‐dimensional manifolds in mid‐layers (improving robustness). * Post hoc, one can attach lightweight probes (e.g. linear regressors) to the known SAE‐latent axes to produce human‐readable artifact scores (“warping = 0.8, blur = 0.3”). * Finally, causal‐steering experiments (Figure 3) confirm that nudging these latent axes in the positive artifact direction raises fake‐classification confidence—thereby offering not only correlational but causal interpretability.
In sum, our forensic manifold analysis unifies a formal geometric vocabulary (manifolds, dimension, curvature, selectivity) with a practical probing toolkit (controlled augmentations, SAE compression, PCA, discrete curvature) to open the “black box” of deepfake detectors. This framework reveals exactly which internal feature modes correspond to specific artifact types and how they evolve across depth—paving the way for detectors that are both accurate and inherently transparent.