Papers
Topics
Authors
Recent
Search
2000 character limit reached

Contractive Denoising Autoencoders

Updated 11 March 2026
  • Contractive Denoising Autoencoders (CDAE) are neural models that integrate a denoising criterion with a contractive penalty to achieve robust and invariant feature representations.
  • The model employs a symmetric encoder–decoder architecture optimized via reconstruction loss and the Frobenius norm of the encoder’s Jacobian to mitigate noise and local perturbations.
  • Empirical results on MNIST demonstrate that CDAEs modestly improve classification accuracy compared to traditional autoencoders by effectively combining denoising and contraction.

Contractive Denoising Autoencoders (CDAEs) are a variant of autoencoder neural networks that combine the denoising criterion of Denoising Autoencoders (DAEs) and the local invariance of Contractive Autoencoders (CAEs). A CDAE is designed to be robust to both large input corruptions and infinitesimal perturbations, yielding feature representations that are simultaneously insensitive to structured noise and small input variations. In its canonical form, the model is realized as a symmetric encoder–decoder architecture and trained by optimizing a sum of denoising reconstruction loss and the Frobenius norm of the encoder’s Jacobian, subject to a stochastic corruption process on the input (Chen et al., 2013).

1. Architecture and Model Specification

A single-layer CDAE consists of an encoder–decoder pair operating on an input xRd\mathbf{x} \in \mathbb{R}^d. The process begins by sampling a corrupted version x~pn(x~x)\tilde{\mathbf{x}} \sim p_n(\tilde{\mathbf{x}}|\mathbf{x}). The encoder function ff maps x~\tilde{\mathbf{x}} into a hidden representation h=f(x~)\mathbf{h} = f(\tilde{\mathbf{x}}), and the decoder gg reconstructs the input as x^x\hat{\mathbf{x}} \approx \mathbf{x}.

  • Encoder: f(x~)=σ(Wx~+b)f(\tilde{\mathbf{x}}) = \sigma ( W \tilde{\mathbf{x}} + \mathbf{b} ), with weight matrix WRh×dW \in \mathbb{R}^{h \times d}, bias bRh\mathbf{b} \in \mathbb{R}^h, and nonlinearity σ{sigmoid, tanh}\sigma \in \{\text{sigmoid, tanh}\}.
  • Decoder: g(h)=σ(Wh+c)g(\mathbf{h}) = \sigma ( W^\top \mathbf{h} + \mathbf{c} ), with bias cRd\mathbf{c} \in \mathbb{R}^d and tied weights for architectural symmetry.

The typical training loop per minibatch proceeds as:

  1. Corrupt: x~(i)=corrupt(x(i))\tilde{\mathbf{x}}^{(i)} = \text{corrupt}(\mathbf{x}^{(i)})
  2. Encode: h(i)=σ(Wx~(i)+b)\mathbf{h}^{(i)} = \sigma(W \tilde{\mathbf{x}}^{(i)} + \mathbf{b})
  3. Decode: x^(i)=σ(Wh(i)+c)\hat{\mathbf{x}}^{(i)} = \sigma(W^\top \mathbf{h}^{(i)} + \mathbf{c})
  4. Compute loss and gradients
  5. Update parameters via stochastic gradient descent (SGD) or variants

2. Objective Function and Regularization

The CDAE objective function is the sum of a denoising reconstruction term and a contractive penalty, imposed for each data point x\mathbf{x}: JCDAE(x)=Ex~pn(x~x)[xg(f(x~))22]+λJf(x~)F2J_{\rm CDAE}(x) = \mathbb{E}_{\tilde{x} \sim p_n(\tilde{x}|x)} \left[ \| x - g(f(\tilde{x})) \|_2^2 \right] + \lambda \| J_f(\tilde{x}) \|^2_F

  • Denoising Loss: Enforces robustness to large corruptions by reconstructing the original input from its noisy version: Lrec=Expdata(x), x~pn(x~x)xg(f(x~))22L_{\rm rec} = \mathbb{E}_{x \sim p_{\rm data}(x),\ \tilde{x} \sim p_n(\tilde{x}|x)} \| x - g(f(\tilde{x})) \|_2^2
  • Contractive Penalty: The Frobenius norm of the encoder’s Jacobian Jf(x~)=f(x~)x~J_f(\tilde{x}) = \frac{\partial f(\tilde{x})}{\partial \tilde{x}} penalizes local sensitivity. For elementwise activations, this is: Jf(x~)F2=i=1h[σ(zi)]2Wi,22\| J_f(\tilde{x}) \|_F^2 = \sum_{i=1}^h [\sigma'(z_i)]^2 \| W_{i,\cdot} \|_2^2 where zi=Wix~+biz_i = W_i \cdot \tilde{x} + b_i and σ()\sigma'(\cdot) depends on the activation.

For σ=tanh\sigma = \text{tanh} (used in experiments): Jf(x~)F2=i=1h[1hi2]2Wi,22\| J_f(\tilde{x}) \|_F^2 = \sum_{i=1}^h [1 - h_i^2]^2 \|W_{i,\cdot}\|_2^2

3. Stochastic Corruption and Noise Models

CDAEs employ a stochastic corruption process pnp_n to push the learned mapping toward the data manifold's structure.

  • Masking noise: A fraction ν\nu of input units is set to zero. In experiments, approximately every 80th pixel in the 784-dimensional MNIST inputs is masked (effectively ν1.25%\nu \approx 1.25\%).
  • Gaussian noise: Additive perturbation x~=x+ϵ\tilde{x} = x + \epsilon, ϵN(0,σ2I)\epsilon \sim \mathcal{N}(0, \sigma^2 I).

The principal mechanism is to drive the model to reconstruct uncorrupted inputs from their noisy versions, fostering robustness to input-level noise.

4. Stacking, Training Methodology, and Optimization

CDAEs admit stacking to form deep architectures. The standard pretraining sequence is as follows:

  1. Train the first CDAE layer on x\mathbf{x} to obtain h1\mathbf{h}^1.
  2. Use h1\mathbf{h}^1 as input, apply corruption, and train a second CDAE to yield h2\mathbf{h}^2.
  3. Repeat stacking as desired.

Layer-wise unsupervised pretraining is performed independently for each module to minimize JCDAEJ_{\rm CDAE} for its layer. After this stage, the network can act as a fixed feature extractor. In the primary experimental protocol, the learned codes from the middle layer are input to an SVM with RBF kernel; no supervised backpropagation-based fine-tuning is reported.

Parameter initialization follows the “Xavier” scheme: WijUniform(6d+h,+6d+h)W_{ij} \sim \text{Uniform}\left( -\sqrt{\frac{6}{d + h}}, +\sqrt{\frac{6}{d + h}} \right) Biases are initialized to zero. Optimization is conducted using SGD or variants such as momentum and Adam.

5. Hyperparameters and Implementation Specifics

Empirical studies were conducted on MNIST using two symmetric autoencoder architectures:

  • 784–200–100–200–784
  • 784–200–50–200–784

Key hyperparameter settings:

  • Hidden units hh: 100 or 50 (bottleneck layer).
  • Contractive penalty weight λ\lambda: 0.1.
  • Corruption level: every 80th pixel masked (1.25%\approx 1.25\%).
  • Activation: tanh\tanh for both encoder and decoder layers.
  • Weight initialization: Xavier uniform.
  • Bias initialization: zeros.
  • Batch size, learning rate, and number of epochs are not specified; typical values in related literature are batch size 100, learning rate 0.01–0.1, epochs 50–200.

6. Experimental Evaluation and Comparative Results

The CDAE was evaluated on a subset of 18,000 MNIST digit images (1,800 per class), split equally for training and test. After pretraining two stacked CDAE layers, the middle code (size 100 or 50) was used as input to a radial basis function SVM for classification.

Observed test accuracy:

Architecture AE DAE CAE CDAE
784–200–100–200–784 92.42% 92.51% 93.11% 93.31%
784–200–50–200–784 93.12% 93.28% 93.31% 93.77%

CDAE outperforms AE, DAE, and CAE on these MNIST subsets. No ablation studies varying λ\lambda or noise level are provided.

7. Theoretical Properties, Empirical Insights, and Limitations

CDAE’s dual regularization yields features that are robust both to large stochastic corruptions (denoising) and to infinitesimal input changes (contraction). The contractive penalty specifically encourages encoder invariance to local perturbations, producing smoother low-dimensional embeddings. Empirically, combining denoising and contractive penalties delivers a consistent, though modest, improvement in classification accuracy compared to either regularizer alone.

Identified limitations:

  • All reported results are restricted to MNIST; extension to other domains remains unverified.
  • The contractive penalty introduces O(hd)O(hd) additional computation per sample, with increasing cost for large hidden or input layers.
  • No experiments addressing supervised end-to-end fine-tuning are reported; integration of CDAE pretraining into a full supervised pipeline is untested.
  • Systematic study of hyperparameter impact (penalty weight λ\lambda, fraction of corrupted inputs, number of layers) is not conducted.

A plausible implication is that the approach is straightforward to implement and scale, but broader empirical validation and efficiency improvements remain open research directions (Chen et al., 2013).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Contractive Denoising Autoencoders (CDAE).