Stacked Denoising Autoencoder (SDAE)
- Stacked Denoising Autoencoder (SDAE) is a deep learning model that learns noise-resistant representations by reconstructing clean inputs from corrupted ones.
- It employs a two-stage training: greedy layer-wise pre-training with denoising autoencoders followed by global fine-tuning to refine features.
- SDAEs are widely applied in fields like vision, speech, biomedical imaging, and financial forecasting to handle noisy or incomplete datasets.
A Stacked Denoising Autoencoder (SDAE) is a deep neural architecture designed for learning robust, hierarchically structured representations from corrupted data. SDAEs leverage layer-wise unsupervised pre-training, where each layer is trained as a denoising autoencoder to reconstruct clean inputs from noise-perturbed versions, followed by global fine-tuning for either supervised or unsupervised objectives. They have been influential in tasks where generalization from noisy, incomplete, or limited labeled data is critical, and are widely adapted across vision, speech, natural language, biomedical imaging, financial forecasting, and more.
1. Mathematical Formulation of the Stacked Denoising Autoencoder
A single denoising autoencoder (DA) consists of an encoder and decoder pair. Let denote a clean input. The corruption process generates a stochastically corrupted input , often via masking noise (independently setting each to zero with probability ):
The encoder maps the corrupted input to a hidden representation:
where , are layer parameters and is typically the sigmoid nonlinearity .
The decoder reconstructs the clean input from :
with weights , . The reconstruction loss—either squared error or cross-entropy—is minimized over the parameters:
Greedily stacking such DAs forms the SDAE: each layer is trained to reconstruct its clean input from corrupted versions, and the output of each encoder becomes the input to the next layer:
After unsupervised pre-training, all layers are “unfolded” and optionally fine-tuned together with supervised loss (e.g., softmax cross-entropy for classification).
2. Pre-training and Training Algorithms
The canonical SDAE training protocol consists of two stages:
1. Greedy Layer-wise Pre-training:
- Each DAE layer is trained sequentially while previous layers are frozen.
- Corruption is independently applied to each layer’s input (commonly –$0.3$ for masking) (Chowdhury et al., 2018, Liang et al., 2021, Kalmanovich et al., 2014).
2. Fine-tuning:
- After stacking, the encoders are combined (decoders can be discarded or used for autoencoding tasks).
- For supervised contexts, a classifier (typically softmax or logistic for binary/multi-class regression) is appended:
- Entire network is fine-tuned using backpropagation and stochastic gradient descent (SGD), minimizing negative log-likelihood or other relevant loss functions, potentially with regularization such as L2 weight decay or dropout (Chowdhury et al., 2018, Kalmanovich et al., 2014, Liang et al., 2021).
- Fine-tuning is critical for adapting layerwise-learned features to the end task and consistently yields improvements in generalization [1412