Papers
Topics
Authors
Recent
Search
2000 character limit reached

Doubly Stochastic Adversarial Autoencoder

Updated 6 March 2026
  • The paper introduces DS-AAE, a novel generative autoencoder that replaces deterministic discriminators with a stochastic function space to mitigate adversarial overfitting.
  • It employs random feature mappings to smooth gradients and enforce a robust matching between the latent aggregated posterior and the prior distribution.
  • Empirical results on binary MNIST indicate improved mode coverage and sample diversity compared to standard AAEs, despite modest Parzen-window likelihood scores.

The Doubly Stochastic Adversarial Autoencoder (DS-AAE) is a generative autoencoder architecture that replaces the deterministic adversary of Adversarial Autoencoders (AAEs) with a space of stochastic functions parameterized via random feature mappings. This innovation, leveraging kernel–random process duality, introduces a controlled source of auxiliary randomness to the adversarial regularization. DS-AAE targets key limitations in traditional adversarial autoencoders, particularly overfitting of the adversary and inadequate mode coverage, by promoting exploration and sample diversity while preserving computational efficiency (Azarafrooz, 2018).

1. Architecture and Key Components

DS-AAE consists of three principal modules:

  • Encoder eθ:XZe_{\theta}:\mathcal X\rightarrow\mathcal Z: A deterministic, feed-forward network mapping data samples xXx\in\mathcal X to latent codes zz.
  • Decoder (Generator) gψ:ZXg_{\psi}:\mathcal Z\rightarrow\mathcal X: A mirroring feed-forward network reconstructing the data from the latent code.
  • Stochastic Adversary fαf_{\alpha}: In contrast to the standard parametrized discriminator of AAEs, the adversarial function is chosen from a continuum of stochastic functions, constructed through random feature maps ϕW()\phi_{\mathcal W}(\cdot) parameterized by auxiliary randomness W\mathcal W and linear weights α\alpha.

Graphically, data, prior samples, and aggregated posterior codes flow through distinct branches. The adversary judges codes originating from the imposed prior P\mathcal P and the aggregated posterior Qθ(Z)Q_\theta(Z), maximizing their discrepancy, while encoder and decoder networks are updated to minimize it.

2. Mathematical Formulation

The DS-AAE objective combines a reconstruction criterion with a doubly stochastic adversarial penalty:

  • Reconstruction Loss: For data XX and code z=eθ(X)z=e_\theta(X), the loss is

Lrec(θ,ψ)=EXdata[Xloggψ(z)+(1X)log(1gψ(z))]L_{\mathrm{rec}}(\theta,\psi) = -\mathbb{E}_{X\sim\mathrm{data}} \left[ X\log g_\psi(z) + (1-X)\log(1-g_\psi(z)) \right]

for cross-entropy, or alternatively mean-squared error.

  • Adversarial Regularizer: To impose prior P\mathcal P on the latent space, a discrepancy minimization is formulated. Standard approaches use an explicit kernel kk in Maximum Mean Discrepancy (MMD):

δMMD(P,Qθ)=EYP[k(Y,)]EZQθ[k(Z,)]H\delta_{\mathrm{MMD}}(\mathcal P, Q_\theta) = \left\| \mathbb{E}_{Y\sim\mathcal P}[k(Y,\cdot)] - \mathbb{E}_{Z\sim Q_\theta}[k(Z,\cdot)] \right\|_{\mathcal H}

DS-AAE improves upon this by defining a doubly stochastic gradient via random features:

ζ()=EW[(ϕW(Y)ϕW(Z))ϕW()]\zeta(\cdot) = \mathbb{E}_{\mathcal W} [ (\phi_{\mathcal W}(Y) - \phi_{\mathcal W}(Z)) \phi_{\mathcal W}(\cdot) ]

Any admissible adversary ff is approximated by a linear form fα()=α,ζ()f_\alpha(\cdot) = \langle\alpha, \zeta(\cdot)\rangle, and the regularizer becomes

δDS(P,Qθ)=maxα{EY[fα(Y)]EZ[fα(Z)]}\delta_{\mathrm{DS}}(\mathcal P, Q_\theta) = \max_\alpha \left\{ \mathbb{E}_Y[f_\alpha(Y)] - \mathbb{E}_Z[f_\alpha(Z)] \right\}

The overall optimization is

minθ,ψLrec(θ,ψ)+λminθmaxα{EY[fα(Y)]EZ[fα(Z)]}\min_{\theta,\psi} L_{\mathrm{rec}}(\theta,\psi) + \lambda \min_\theta \max_\alpha \left\{ \mathbb{E}_Y[f_\alpha(Y)] - \mathbb{E}_Z[f_\alpha(Z)] \right\}

The stochastic feature mapping smooths gradients and prevents the adversary from tightly overfitting to the generator.

3. Optimization and Training Procedure

The DS-AAE is trained by minibatch-based alternating updates:

  1. Sample Data and Prior: Draw a minibatch of data points and prior samples.
  2. Latent Encoding: Map data through the encoder to obtain latent codes.
  3. Random Feature Sampling: Draw random feature parameters W\mathcal W and compute corresponding random features.
  4. Build Doubly Stochastic Features: For both data and prior, compute doubly stochastic features via aggregation of random feature contributions.
  5. Adversary Step: Update adversary parameters α\alpha via gradient ascent to maximize discrepancy between prior and aggregated posterior features.
  6. Generator/Encoder Step: Jointly update θ,ψ\theta, \psi to minimize the reconstruction loss and reduce adversarially measured discrepancy.

Learning rates for all modules are set via Adam optimization (ηθ,ηψ,ηα=103\eta_{\theta}, \eta_{\psi}, \eta_\alpha = 10^{-3}), with specific dropout only on the encoder's input. Empirical findings underscore batch size sensitivity and the need for small learning rates when adversarial functions temporarily drift outside the RKHS, ensuring convergence.

4. Empirical Performance and Comparisons

Experiments were conducted on binary-thresholded MNIST using the following architecture (for DS-AAE): an encoder with three fully connected layers (1024→512→256→6), and a decoder symmetric in structure (256→512→1024→784), with ReLU activations except the final sigmoid layer.

The imposed prior is a six-dimensional isotropic Gaussian. Random features were instantiated using an RBF kernel (σ=1\sigma=1), drawing ~500 features per batch.

Performance was evaluated via Parzen-window test log-likelihood (on 10K samples), compared to GAN, GMMN+AE, AAE, and MMD-AE:

Model Parzen LL (±\pm std)
GAN 225±2225 \pm 2
GMMN+AE 282±2282 \pm 2
AAE 340±2340 \pm 2
MMD-AE 228±1.6228 \pm 1.6
DS-AAE 243.2±1.7243.2 \pm 1.7

Qualitatively, DS-AAE samples demonstrated greater diversity in handwriting styles compared to AAE and MMD-AE and maintained sharp, hole-free latent traversals across classes, with increased multimodality.

5. Advantages, Limitations, and Distinctive Features

Advantages over AAE and VAE:

  • Enhanced mode exploration due to auxiliary randomness W\mathcal W, reducing adversarial overfitting and generator collapse.
  • Smoother adversarial gradients contribute to improved training stability.
  • Increased capacity to match multimodal priors owing to the continuum of admissible stochastic adversaries.

Limitations:

  • Elevated sensitivity to batch size; insufficiently large batches degrade gradient approximation quality.
  • Temporary excursions of adversarial functions outside the RKHS require conservative learning rates, slowing training.
  • Parzen-window test likelihoods indicate DS-AAE underperforms AAE quantitatively, though diversity is improved.

6. Extensions and Applications

Potential future directions include:

  • Applying alternate positive-definite kernels (polynomial, Laplacian) by adapting corresponding random-feature sketches.
  • Incorporating convolutional architectures to scale to natural image datasets (e.g., CIFAR, CelebA).
  • Developing semi-supervised or conditional variants by conditioning the prior P\mathcal P on auxiliary information such as labels.
  • Stacking DS-AAE modules hierarchically for enriched latent structure.
  • Exploring applications in domain adaptation, anomaly detection, and across modalities (including text and time series).

DS-AAE represents a refinement of autoencoding adversarial frameworks, leveraging stochastic function spaces for the adversary to foster diversity and mitigate mode collapse, at the cost of increased sensitivity to batch size and potentially slower convergence dynamics (Azarafrooz, 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Doubly Stochastic Adversarial Autoencoder (DS-AAE).