Capsule Network Performance on Complex Data (1712.03480v1)

Published 10 Dec 2017 in stat.ML and cs.LG

Abstract: In recent years, convolutional neural networks (CNN) have played an important role in the field of deep learning. Variants of CNN's have proven to be very successful in classification tasks across different domains. However, there are two big drawbacks to CNN's: their failure to take into account of important spatial hierarchies between features, and their lack of rotational invariance. As long as certain key features of an object are present in the test data, CNN's classify the test data as the object, disregarding features' relative spatial orientation to each other. This causes false positives. The lack of rotational invariance in CNN's would cause the network to incorrectly assign the object another label, causing false negatives. To address this concern, Hinton et al. propose a novel type of neural network using the concept of capsules in a paper. With the use of dynamic routing and reconstruction regularization, the capsule network model would be both rotation invariant and spatially aware. The capsule network has shown its potential by achieving a state-of-the-art result of 0.25% test error on MNIST without data augmentation such as rotation and scaling, better than the previous baseline of 0.39%. To further test out the application of capsule networks on data with higher dimensionality, we attempt to find the best set of configurations that yield the optimal test error on CIFAR10 dataset.

Citations (206)

View on Semantic Scholar

Summary

The paper demonstrates that adding convolution layers and ensemble averaging to capsule networks improves CIFAR-10 validation accuracy by 2.57% over the baseline.
The authors show that reconstruction regularization, effective on simpler datasets like MNIST, is less impactful on complex data due to multiple viewpoints.
The study suggests that incorporating matrix capsules with EM routing could better handle viewpoint invariance and enhance performance on challenging datasets.

Capsule Network Performance on Complex Data

In this paper, the authors investigate the performance of capsule networks on complex datasets, building on recent advancements that have demonstrated their promise in addressing some limitations inherent in conventional convolutional neural networks (CNNs). The motivation stems from the fact that CNNs, while successful in various classification tasks, lack sophisticated representation capabilities, particularly rotational invariance and the ability to capture spatial hierarchies comprehensively. Capsule networks, with their architecture innovated by Hinton et al., aim to address these deficiencies through dynamic routing and reconstruction regularization, achieving notable accuracy on datasets like MNIST without data augmentation.

The paper presents an empirical evaluation of capsule networks on the CIFAR-10 dataset, known for its higher dimensional complexity compared to MNIST. The research assesses various architectural modifications and hyper-parameter tuning to optimize test accuracy. Several approaches are explored, including stacking additional capsule layers, increasing the count of primary capsules, employing ensemble averaging, modifying reconstruction loss scaling, and introducing customized activation functions. Despite these explorations, only certain modifications like adding convolution layers and ensemble averaging yielded promising increases in validation accuracy, demonstrating the potential but also the current limitations of capsule networks on more complex datasets.

Experimentation with the CIFAR-10 dataset provided valuable insights, albeit with modest successes. The best-performing configuration achieved a validation accuracy of 71.55% over 50 epochs using a 4-model ensemble with additional convolution layers, which represented a 2.57% improvement over the baseline. However, the authors note that due to computational constraints, more extensive ensembles and configurations could not be tested to full potential, leaving a gap between achieved performance and the state-of-the-art.

A key aspect of the paper is a discussion on the reconstruction loss strategy. While successful with MNIST, the reconstruction-based regularization seemed less effective for CIFAR-10, possibly because it does not account for the multiple viewpoints inherent in complex, real-world objects. This point highlights a potential avenue for further innovation in capsule networks—developing methods that improve viewpoint invariance, potentially through the incorporation of matrices handling pose variations.

A speculative avenue for future research is noted, namely employing matrix capsules with EM routing, which promises more robust handling of complex data by incorporating the relative transformation between objects and viewpoints. This direction could address the inadequacy of current capsule configurations in translating their MNIST successes to more realistic datasets.

In conclusion, while the capsule network architecture, as tested, does not outperform established CNN methods on complex datasets like CIFAR-10, the research underscores the potential of capsule networks with appropriate adaptations. Further research focusing on improving depth, routing mechanisms, and exploitation of novel regularization methods could align capsule networks closely with state-of-the-art expectations, promising enhanced capabilities in machine learning and computer vision tasks.

PDF Markdown

Capsule Network Performance on Complex Data (1712.03480v1)

Summary

Capsule Network Performance on Complex Data

Related Papers