Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 83 tok/s

Gemini 2.5 Pro 34 tok/s Pro

GPT-5 Medium 24 tok/s Pro

GPT-5 High 21 tok/s Pro

GPT-4o 130 tok/s Pro

Kimi K2 207 tok/s Pro

GPT OSS 120B 460 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Stacked What-Where Auto-encoders (1506.02351v8)

Published 8 Jun 2015 in stat.ML, cs.LG, and cs.NE

Abstract: We present a novel architecture, the "stacked what-where auto-encoders" (SWWAE), which integrates discriminative and generative pathways and provides a unified approach to supervised, semi-supervised and unsupervised learning without relying on sampling during training. An instantiation of SWWAE uses a convolutional net (Convnet) (LeCun et al. (1998)) to encode the input, and employs a deconvolutional net (Deconvnet) (Zeiler et al. (2010)) to produce the reconstruction. The objective function includes reconstruction terms that induce the hidden states in the Deconvnet to be similar to those of the Convnet. Each pooling layer produces two sets of variables: the "what" which are fed to the next layer, and its complementary variable "where" that are fed to the corresponding layer in the generative decoder.

Citations (258)

View on Semantic Scholar

Summary

The paper presents a novel architecture that bifurcates pooling outputs into 'what' and 'where' variables, ensuring balanced feedforward and feedback mapping.
It combines convolutional and deconvolutional networks with an integrated loss comprising discriminative and reconstruction terms for robust feature learning.
Empirical results on MNIST, SVHN, and STL-10 demonstrate enhanced scalability, convergence, and classification performance in diverse data-label scenarios.

An Examination of Stacked What-Where Auto-encoders

The paper presents a novel architecture known as Stacked What-Where Auto-encoders (SWWAE). This architecture integrates both discriminative and generative pathways, proposing a unified framework applicable to supervised, semi-supervised, and unsupervised learning without reliance on sampling during training. The SWWAE architecture deploys a convolutional network (Convnet) for encoding inputs, and a deconvolutional network (Deconvnet) for reconstructing outputs. The primary methodological innovation lies in the bifurcation of pooling layer outputs into "what" and "where" variables, elementally segregating content and positional information.

Model Architecture

SWWAE's architecture addresses the issue of asymmetric mapping typically present in auto-encoder structures. Traditional auto-encoders suffer from a disparity between the many-to-one mapping in feed-forward pathways and one-to-many mapping in feed-back pathways. SWWAE counteracts this by computing complementary "where" variables at each pooling layer, facilitating reconstruction alongside the traditional "what" variable propagation. The model optimizes a loss function comprising a discriminative loss term, a reconstruction loss term at both input and intermediate levels, thus ensuring symmetry between the hidden states of feed-forward and feedback pathways.

SWWAE conceptualizes the dual roles of "what" and "where," akin to the mechanisms found in transforming auto-encoders, but without the requirement for true latent states. This architecture also extends beyond existing approaches like RBMs and DBMs by circumventing their reliance on sampling, thus providing improved scalability and convergence properties. Additionally, the integration of supervised feedback into the training of SWWAEs contrasts with purely unsupervised feature learning methodologies, emphasizing a unified learning framework across different data modalities.

Empirical Evaluations

Empirical analysis demonstrates SWWAE's effectiveness on tasks requiring different proportions of labeled and unlabeled data, showcasing its adaptability across MNIST, SVHN, and STL-10 datasets. The application of SWWAE in semi-supervised contexts—particularly on image datasets—has confirmed its potential in leveraging large volumes of unlabeled data to improve model accuracy and generalization.

A significant finding is the augmentation of classification performance when SWWAE operates under realistic data conditions requiring minimal dropout and other regularizers. This is substantiated by superior testing performance on datasets like STL-10, where SWWAE achieved competitive results compared to state-of-the-art models, signaling robust architectural benefits.

Implications and Future Directions

The implications of SWWAE extend to enhancing existing convolutional architectures for broader data learning frameworks. The model's inherent capability to handle unsupervised and supervised signals harmoniously suggests practical applications in data scenarios typified by scarcity of labels. Furthermore, SWWAE's architecture represents a step toward more efficient neural network design, capable of scaling to larger data modalities while maintaining computational feasibility.

Looking forward, the adaptability of SWWAE suggests its integration into more complex datasets and tasks, particularly those involving high-dimensional feature spaces such as video data. Furthermore, extending the underlying principles of SWWAE may reveal new avenues for architectures that synergize the dual-channel behavior of discriminative and generative pathways, potentially leading to more efficient models in both theoretical advances and applied AI systems.