Network In Network (1312.4400v3)

Published 16 Dec 2013 in cs.NE, cs.CV, and cs.LG

Abstract: We propose a novel deep network structure called "Network In Network" (NIN) to enhance model discriminability for local patches within the receptive field. The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input. Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator. The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer. Deep NIN can be implemented by stacking mutiple of the above described structure. With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers. We demonstrated the state-of-the-art classification performances with NIN on CIFAR-10 and CIFAR-100, and reasonable performances on SVHN and MNIST datasets.

Citations (6,114)

View on Semantic Scholar

Summary

The paper presents the novel MLPconv layers that replace linear convolutions to enhance local non-linearity in neural networks.
It details the NIN model's application on CIFAR, SVHN, and MNIST, achieving superior classification performance compared to traditional CNNs.
The study demonstrates that integrating fully connected layers within convolutions yields both practical enhancements and theoretical insights.

Network In Network

The paper "Network In Network", authored by Min Lin, Qiang Chen, and Shuicheng Yan, presents an innovative approach to convolutional neural networks (CNNs) by introducing the concept of Multi-Layer Perceptron Convolutional (MLPconv) layers. This modification is designed to enhance the representational capacity of traditional convolutional layers by embedding fully connected layers within them, thus performing local linear transformations and enhancing non-linearity.

Key Contributions

The core contribution of the Network In Network (NIN) architecture lies in replacing traditional linear convolutional layers with MLPconv layers, which consist of multiple layers of 1x1 convolutions. These MLPconv layers can be viewed as cascadable cross-channel parametric pooling (cccp pooling) layers, allowing for a shared mode across different layers and enabling a more flexible and powerful structure for capturing complex patterns in data.

Architecture and Implementation

The NIN model was evaluated on four benchmark datasets: CIFAR-10, CIFAR-100, SVHN, and MNIST. The architecture for each dataset is as follows:

CIFAR-10 and CIFAR-100:

Convolution (kernel: 5, output: 192, pad: 2)
Two cccp layers (outputs: 160, 96)
Max pooling (kernel: 3, stride: 2)
Dropout (ratio: 0.5)
Convolution (kernel: 5, output: 192, pad: 2)
Two cccp layers (outputs: 192, 192)
Max pooling (kernel: 3, stride: 2)
Dropout (ratio: 0.5)
Convolution (kernel: 3, output: 192, pad: 1)
Two cccp layers (outputs: 192, 10 for CIFAR-10, 100 for CIFAR-100)
Global average pooling (gap)

SVHN:

Convolution (kernel: 5, output: 128, pad: 2)
Two cccp layers (outputs: 128, 96)
Max pooling (kernel: 3, stride: 2)
Dropout (ratio: 0.5)
Convolution (kernel: 5, output: 320, pad: 2)
Two cccp layers (outputs: 256, 128)
Max pooling (kernel: 3, stride: 2)
Dropout (ratio: 0.5)
Convolution (kernel: 5, output: 384, pad: 2)
Two cccp layers (outputs: 256, 10)
Global average pooling (gap)

MNIST:

Convolution (kernel: 5, output: 96, pad: 2)
Two cccp layers (outputs: 64, 48)
Max pooling (kernel: 3, stride: 2)
Dropout (ratio: 0.5)
Convolution (kernel: 5, output: 128, pad: 2)
Two cccp layers (outputs: 96, 48)
Max pooling (kernel: 3, stride: 2)
Dropout (ratio: 0.5)
Convolution (kernel: 5, output: 128, pad: 2)
Two cccp layers (outputs: 96, 10)
Global average pooling (gap)

Results and Implications

The empirical results, as demonstrated by the above architectures, suggest that the NIN model outperforms traditional CNNs in terms of classification accuracy on benchmark datasets. By utilizing MLPconv layers, the NIN model is able to capture more complex patterns and relationships within the data, which improves generalization performance.

The implications of this research are two-fold: practical and theoretical. Practically, the use of MLPconv layers can be integrated into existing neural network frameworks to potentially enhance their performance without significant changes to the underlying architecture. Theoretically, this work demonstrates the advantages of incorporating more complex, non-linear transformations at the local receptive field level within CNNs.

Future Developments

Future developments in this area may explore further optimization of the NIN architecture, such as experimenting with different depths and configurations of MLPconv layers. Additionally, this approach could be extended to other domains beyond image classification, such as natural language processing or signal processing, where capturing intricate patterns is crucial. The intersection of NIN with other emerging techniques, such as attention mechanisms or graph neural networks, also represents a promising research direction.

In conclusion, the Network In Network framework presents a robust improvement over traditional convolutional architectures by introducing the MLPconv layer, thereby enhancing the capacity of neural networks to model complex data patterns.

PDF Markdown

Related Papers

Tweets

https://twitter.com/2prime_PKU/status/1785549072046141848

https://twitter.com/the_moliver/status/1912868024820146449

https://twitter.com/themintsv/status/1873897430753960167

YouTube

Show All Videos