Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 79 tok/s

Gemini 2.5 Pro 55 tok/s Pro

GPT-5 Medium 27 tok/s Pro

GPT-5 High 26 tok/s Pro

GPT-4o 85 tok/s Pro

GPT OSS 120B 431 tok/s Pro

Kimi K2 186 tok/s Pro

2000 character limit reached

Template-Based Automatic Search of Compact Semantic Segmentation Architectures (1904.02365v2)

Published 4 Apr 2019 in cs.CV

Abstract: Automatic search of neural architectures for various vision and natural language tasks is becoming a prominent tool as it allows to discover high-performing structures on any dataset of interest. Nevertheless, on more difficult domains, such as dense per-pixel classification, current automatic approaches are limited in their scope - due to their strong reliance on existing image classifiers they tend to search only for a handful of additional layers with discovered architectures still containing a large number of parameters. In contrast, in this work we propose a novel solution able to find light-weight and accurate segmentation architectures starting from only few blocks of a pre-trained classification network. To this end, we progressively build up a methodology that relies on templates of sets of operations, predicts which template and how many times should be applied at each step, while also generating the connectivity structure and downsampling factors. All these decisions are being made by a recurrent neural network that is rewarded based on the score of the emitted architecture on the holdout set and trained using reinforcement learning. One discovered architecture achieves 63.2% mean IoU on CamVid and 67.8% on CityScapes having only 270K parameters. Pre-trained models and the search code are available at https://github.com/DrSleep/nas-segm-pytorch.

Citations (10)

View on Semantic Scholar

Collections

Summary

The paper introduces a template-based NAS method that reduces reliance on large pre-trained classifiers to design efficient segmentation architectures.
It uses a recurrent neural network controller with reinforcement learning to select and optimize template configurations.
Results show competitive performance (67.8% IoU on CityScapes) with only 270K parameters, outperforming many existing compact models.

Overview of "Template-Based Automatic Search of Compact Semantic Segmentation Architectures"

The paper "Template-Based Automatic Search of Compact Semantic Segmentation Architectures" by Vladimir Nekrasov, Chunhua Shen, and Ian Reid explores an innovative approach in the field of Neural Architecture Search (NAS) specifically for the task of semantic segmentation. This domain, characterized by dense per-pixel classification challenges, requires architectures that are not only accurate but also computationally efficient. The presented work aims to address the limitations of current NAS methodologies, which tend to depend heavily on large pre-trained image classifiers, consequently leading to architectures that are suboptimal in terms of parameter efficiency when adapted for segmentation tasks.

Methodology

The authors propose a novel NAS strategy that reduces the reliance on extensive pre-trained networks and encourages the discovery of light-weight yet precise architectures. Their approach involves starting the search from a few blocks of a pre-trained classification network, specifically MobileNet-v2 in this case, which contains roughly 60K parameters. The process introduces a template-based methodology wherein sets of operations are used to form templates. These templates are instrumental in predicting the network configuration, including layer connectivity and downsampling factors, which are crucial for constructing compact architectures.

A key component of their methodology is the use of a recurrent neural network that acts as the controller within the NAS framework. This controller, which is trained using reinforcement learning, iteratively selects templates, determines their configurations, and optimizes their performance based on a reward signal obtained from the validation set. The template structure allows for repeated use in the network, enhancing search efficiency and fostering the discovery of optimal architectures.

Results

The paper reports significant outcomes wherein several discovered architectures have shown competitive performance on standard semantic segmentation benchmarks like CityScapes and CamVid. Notably, one architecture achieved a mean Intersection over Union (IoU) of 67.8% on the CityScapes dataset and 63.2% on the CamVid dataset, utilizing only 270K parameters, a stark contrast to many conventional models. These results underscore the efficacy of the proposed method in generating both resource-efficient and high-performing segmentation networks.

The experimental results further illustrate that the discovered architectures rival or exceed the performance of existing compact methods like ENet, ESPNet, and ERFNet, despite the significantly smaller parameter budgets. The paper highlights the capability of the approach to produce architectures that are well-suited for applications in environments with limited computational resources, such as robotics or mobile devices.

Implications and Future Directions

The approach presented by Nekrasov et al. introduces a shift in NAS applications for segmentation tasks, emphasizing minimal reliance on large pre-trained classifiers. This could reshape strategies for developing neural architectures where computational efficiency is paramount. The findings suggest potential improvements in NAS methodologies, possibly integrating real-time constraints directly into the search objective, which could further enhance the applicability in latency-sensitive scenarios.

Future developments could build upon this work by integrating more sophisticated operations within the templates or by extending the search capabilities to other vision tasks. Moreover, addressing some current limitations—such as the manual selection of operations and the fixed initial architecture stem—could catalyze further advancements in the automatic discovery of lightweight architectures optimized for diverse tasks across computer vision.