Fractional Max-Pooling (1412.6071v4)

Published 18 Dec 2014 in cs.CV

Abstract: Convolutional networks almost always incorporate some form of spatial pooling, and very often it is alpha times alpha max-pooling with alpha=2. Max-pooling act on the hidden layers of the network, reducing their size by an integer multiplicative factor alpha. The amazing by-product of discarding 75% of your data is that you build into the network a degree of invariance with respect to translations and elastic distortions. However, if you simply alternate convolutional layers with max-pooling layers, performance is limited due to the rapid reduction in spatial size, and the disjoint nature of the pooling regions. We have formulated a fractional version of max-pooling where alpha is allowed to take non-integer values. Our version of max-pooling is stochastic as there are lots of different ways of constructing suitable pooling regions. We find that our form of fractional max-pooling reduces overfitting on a variety of datasets: for instance, we improve on the state-of-the art for CIFAR-100 without even using dropout.

Citations (512)

View on Semantic Scholar

Summary

The paper introduces a fractional max-pooling method that uses non-integer downscaling factors to preserve more spatial information in CNNs.
It compares random and pseudorandom pooling strategies, including overlapping regions, to enable finer feature extraction at multiple scales.
Experimental results on datasets like CIFAR-100 demonstrate reduced error rates and improved network generalization without relying on dropout.

Fractional Max-Pooling

Fractional Max-Pooling (FMP) represents a novel approach to enhancing the architecture of Convolutional Neural Networks (CNNs), a fundamental tool in image recognition. Traditional max-pooling techniques, typically $2 \times 2$ , reduce the spatial dimensions of inputs by an integer factor, often leading to rapid size reduction and potentially hindering network performance due to the disjoint nature of pooling regions. This paper introduces a fractional approach, allowing non-integer downscaling factors, thereby preserving more spatial information.

Key Concepts and Methodology

Fractional Max-Pooling introduces stochastic pooling regions determined by a fractional factor $\alpha$ , where $1 < \alpha < 2$ . This innovation enables a gentler reduction of spatial dimensions, allowing for more pooling layers within a neural network and thereby facilitating feature extraction at multiple scales. The paper explores two methodologies for FMP:

Random and Pseudorandom Pooling Regions: These approaches involve generating pooling regions either randomly or pseudorandomly. Random pooling adapts overlapping or disjoint rectangles, while pseudorandom pooling employs deterministic sequences that approximate the desired fractional scale reduction.
Overlapping vs. Disjoint Regions: The paper examines configurations where pooling regions might overlap, potentially leading to improved network performance compared to strictly disjoint regions.

Implementation and Training

The authors designed CNN architectures featuring fractional max-pooling layers and tested them on various datasets, such as MNIST, CIFAR-10, CIFAR-100, Assamese handwritten characters, and Chinese handwriting (CASIA-OLHWDB1.1). Training involved sparse convolutional networks with sequences of layers containing alternating convolutional and FMP steps. The networks were evaluated with model averaging, enhancing performance by employing multiple configurations of pooling regions during testing.

Results

The results demonstrate that fractional max-pooling significantly reduces overfitting and improves accuracy on test datasets without necessitating popular techniques such as dropout. Notably, on CIFAR-100, the use of FMP achieved state-of-the-art results. The paper suggests that overlapping and pseudorandom strategies offer superior results, particularly when combined with model averaging techniques.

Numerical Highlights:

On CIFAR-100, a $12\%$ reduction from previous error rates was observed without any data augmentation techniques, indicating the efficacy of the FMP approach.
FMP networks showed consistent improvement across various datasets, with CIFAR-100 achieving test errors of 27.89\% using random and 26.39\% with overlapping pooling regions.

Implications and Future Directions

The implications of this research extend into the realms of enhancing convolutional neural network generalization and performance. Fractional max-pooling offers a technique to increase the robustness of feature extraction by delicately balancing spatial size reduction and data invariance. Future avenues could explore fine-tuning pooling strategies, integrating FMP with other architectural innovations, or examining pooling arrangements that depict more comprehensive elastic distortions.

Moreover, this research encourages exploration into fractional approaches within other neural network components, potentially paving the way for more adaptive, efficient architectures. As the field of deep learning advances, incorporating concepts like fractional max-pooling could further optimize computational efficiency and accuracy in various applications.

PDF Markdown