The Relative Performance of Ensemble Methods with Deep Convolutional Neural Networks for Image Classification (1704.01664v1)

Published 5 Apr 2017 in stat.ML, cs.CV, cs.LG, and stat.ME

Abstract: Artificial neural networks have been successfully applied to a variety of machine learning tasks, including image recognition, semantic segmentation, and machine translation. However, few studies fully investigated ensembles of artificial neural networks. In this work, we investigated multiple widely used ensemble methods, including unweighted averaging, majority voting, the Bayes Optimal Classifier, and the (discrete) Super Learner, for image recognition tasks, with deep neural networks as candidate algorithms. We designed several experiments, with the candidate algorithms being the same network structure with different model checkpoints within a single training process, networks with same structure but trained multiple times stochastically, and networks with different structure. In addition, we further studied the over-confidence phenomenon of the neural networks, as well as its impact on the ensemble methods. Across all of our experiments, the Super Learner achieved best performance among all the ensemble methods in this study.

Citations (343)

View on Semantic Scholar

Summary

The paper demonstrates that the Super Learner method optimally aggregates predictions, outperforming unweighted averaging and majority voting techniques.
It employs diverse CNN architectures such as ResNet, VGG, and GoogLeNet, effectively managing over-confident and weak base learners.
The study provides actionable insights for deploying ensemble methods in critical fields like precision medicine and spatial prediction.

Evaluation of Ensemble Methods with Convolutional Neural Networks in Image Classification

Introduction

In the domain of machine learning, ensemble methods are deployed to combine multiple learning algorithms to achieve superior predictive performance compared to individual models. This paper explores the application of ensemble techniques involving Deep Convolutional Neural Networks (CNNs) and delineates how these methods can be leveraged for image classification tasks. Various ensemble strategies are investigated, including unweighted averaging, majority voting, the Bayes Optimal Classifier (BOC), and the Super Learner (SL), a cross-validation-based stacking approach.

Methodology

The paper conducts experiments involving CNNs with different architectures and parameters, serving as base learners or "candidates" in the ensemble library. The ensemble methods evaluated include:

Unweighted Averaging: Aggregates predictions from different models by averaging their output probabilities or scores, post softmax transformation.
Majority Voting: Determines the final class label by identifying the most frequently predicted outcome among the base models.
Bayes Optimal Classifier: Computes a weighted sum of predictions conditioned on the likelihood of each model hypothesis being correct.
Super Learner: Uses cross-validation to optimize a linear combination of predictions to minimize risk on validation datasets.

Experiments span different networks such as ResNet, NIN, VGG, and GoogLeNet, testing configurations like varied training epochs and distinct initializations. A particular focus is placed on understanding issues such as over-confident models, which can skew ensemble outputs significantly, and leveraging weak learners for enhancing ensemble performance.

Results

Across all examined ensemble methods, the Super Learner consistently exhibited superior performance, attributable to its adaptive approach in weighting model predictions based on validation set performance. It effectively managed ensembles containing models with diverse structures and stabilities, outperforming unweighted averaging, which faltered in scenarios plagued with over-confident base models. While naive averaging demonstrated improved performance with similar base models, it was vulnerable to deteriorated performance in the presence of weak or over-confident networks.

Notably, the paper underlines how the Super Learner is particularly robust, without substantial negative impact from the inclusion of less performant models. Moreover, it achieves this without requiring prior selective filtering of base learners, a significant practical advantage over simple voting and averaging techniques.

Discussion and Implications

The findings articulate the strength of the Super Learner as a flexible and robust ensemble method for image classification with deep CNNs. Its ability to harness a wide array of network structures and maintain performance even with substantial model heterogeneity sets it apart from traditional ensemble strategies. The insights into ensemble behavior, particularly in handling over-confidence or mediocre base learners, provide guidance for practitioners in effectively deploying ensemble systems.

In the broader scope, the implications of this work are substantial for the design of deep learning systems in fields like precision medicine, spatial prediction, and any domain requiring robust image classification. The results also prompt further investigation into extending the Super Learner framework, such as incorporating non-linear activations over stacked base learner predictions, though care must be taken to mitigate overfitting risks on validation datasets.

Overall, this paper contributes meaningful perspectives on the orchestration of ensemble methods, driving considerations for future research and methodology optimization in machine learning applications involving CNNs.

PDF Markdown