Searching for Efficient Multi-Scale Architectures for Dense Image Prediction (1809.04184v1)

Published 11 Sep 2018 in cs.CV, cs.LG, and stat.ML

Abstract: The design of neural network architectures is an important component for achieving state-of-the-art performance with machine learning systems across a broad array of tasks. Much work has endeavored to design and build architectures automatically through clever construction of a search space paired with simple learning algorithms. Recent progress has demonstrated that such meta-learning methods may exceed scalable human-invented architectures on image classification tasks. An open question is the degree to which such methods may generalize to new domains. In this work we explore the construction of meta-learning techniques for dense image prediction focused on the tasks of scene parsing, person-part segmentation, and semantic image segmentation. Constructing viable search spaces in this domain is challenging because of the multi-scale representation of visual information and the necessity to operate on high resolution imagery. Based on a survey of techniques in dense image prediction, we construct a recursive search space and demonstrate that even with efficient random search, we can identify architectures that outperform human-invented architectures and achieve state-of-the-art performance on three dense prediction tasks including 82.7\% on Cityscapes (street scene parsing), 71.3\% on PASCAL-Person-Part (person-part segmentation), and 87.9\% on PASCAL VOC 2012 (semantic image segmentation). Additionally, the resulting architecture is more computationally efficient, requiring half the parameters and half the computational cost as previous state of the art systems.

Citations (398)

View on Semantic Scholar

Summary

The paper introduces a meta-learning framework that constructs Dense Prediction Cells (DPCs) to optimize multi-scale neural architectures for dense image prediction tasks.
It leverages a proxy task to reduce computational cost while maintaining speed and predictive accuracy, outperforming human-designed architectures.
The approach achieves notable improvements across benchmarks, such as 82.7% mIOU on Cityscapes and 87.9% mIOU on PASCAL VOC 2012.

Efficient Architecture Search for Dense Image Prediction: A Review

The paper "Searching for Efficient Multi-Scale Architectures for Dense Image Prediction" addresses the challenge of optimizing neural network architectures for dense image prediction tasks, including scene parsing, person-part segmentation, and semantic image segmentation. The research focuses on improving the efficiency and performance of neural networks through automated architecture design, utilizing meta-learning techniques. The authors succeed in creating architectures that outperform human-designed counterparts while reducing computational complexity.

Overview of Methodology

The authors propose a meta-learning framework designed to facilitate the automatic identification of effective architectures for dense image prediction. The approach involves constructing a recursive search space, termed as a Dense Prediction Cell (DPC), which acts as a building block for the generation of multi-scale representations necessary for dense image prediction. The architecture search space is highly expressive, capable of encapsulating various state-of-the-art architectures while maintaining computational tractability.

The search process is adapted to the dense image prediction domain where multi-scale image representations, high-resolution processing, and context capture are essential. A key contribution of the paper is the design of a proxy task aimed at reducing the computational burden associated with training candidate architectures on high-resolution images. The proxy task employs smaller backbone networks and cached feature maps to accelerate search evaluations while maintaining a sufficient level of predictive power in terms of identifying effective architectures.

Numerical Results

The architectures identified through the search process achieve state-of-the-art performance across multiple benchmarks:

On the Cityscapes dataset for street scene parsing, the proposed model achieves an mIOU of 82.7%, surpassing human-designed architectures by a margin of 0.7%.
For the PASCAL-Person-Part dataset, it records an mIOU of 71.3%, improving upon the previous best result by 3.7%.
In the PASCAL VOC 2012 semantic image segmentation task, a performance of 87.9% is reported, representing a 1.7% enhancement over previous state-of-the-art results.

These results are notable not only for their absolute performance but also for the reduction in computational complexity. The identified architectures require only half the number of trainable parameters and computational cost compared to prior leading solutions when using the Xception network backbone.

Implications and Future Directions

The outcomes of this paper indicate substantial progress in the automated design of neural network architectures for complex dense prediction tasks. By efficiently leveraging the architecture search space and the constructed proxy task, the authors demonstrate that meta-learning techniques can extend beyond traditional image classification tasks to complex image domains requiring high-resolution and context-aware processing.

Several future opportunities emerge from this research. Expanding the search space, including the introduction of intelligent search algorithms like reinforcement learning, evolutionary algorithms, or model-based optimization, could yield further gains in architecture quality and efficiency. There are also implications for extending these methods to related areas such as depth prediction and object detection, offering similar improvements in performance and computational demands.

The work sets a precedent for addressing dense image prediction problems with increased efficiency, providing a foundation for future innovations in automated neural network design.

PDF Markdown