Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
95 tokens/sec
Gemini 2.5 Pro Premium
55 tokens/sec
GPT-5 Medium
20 tokens/sec
GPT-5 High Premium
20 tokens/sec
GPT-4o
98 tokens/sec
DeepSeek R1 via Azure Premium
86 tokens/sec
GPT OSS 120B via Groq Premium
463 tokens/sec
Kimi K2 via Groq Premium
200 tokens/sec
2000 character limit reached

Evolving Normalization-Activation Layers (2004.02967v5)

Published 6 Apr 2020 in cs.LG, cs.CV, cs.NE, and stat.ML

Abstract: Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other. Here we propose to design them using an automated approach. Instead of designing them separately, we unify them into a single tensor-to-tensor computation graph, and evolve its structure starting from basic mathematical functions. Examples of such mathematical functions are addition, multiplication and statistical moments. The use of low-level mathematical functions, in contrast to the use of high-level modules in mainstream NAS, leads to a highly sparse and large search space which can be challenging for search methods. To address the challenge, we develop efficient rejection protocols to quickly filter out candidate layers that do not work well. We also use multi-objective evolution to optimize each layer's performance across many architectures to prevent overfitting. Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures that go beyond existing design patterns. For example, some EvoNorms do not assume that normalization and activation functions must be applied sequentially, nor need to center the feature maps, nor require explicit activation functions. Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets but also transfer well to Mask R-CNN with FPN/SpineNet for instance segmentation and to BigGAN for image synthesis, outperforming BatchNorm and GroupNorm based layers in many cases.

Citations (77)

Summary

  • The paper introduces EvoNorms, unified layers that automate the design of normalization and activation functions.
  • It employs multi-objective evolution and efficient rejection protocols to optimize performance across diverse architectures.
  • Experimental results demonstrate that EvoNorm layers outperform traditional methods in image classification, segmentation, and synthesis.

Evolving Normalization-Activation Layers: An Expert Overview

The paper "Evolving Normalization-Activation Layers" presents an innovative methodology for designing normalization layers and activation functions in deep networks. Traditionally, these components are treated separately, guided by well-accepted heuristics. This research explores the potential of automating their design by unifying them into a single structure, identified as a normalization-activation layer. This approach led to the development of EvoNorms, a set of new layers distinguished by unconventional architectures and enhanced capabilities.

Methodology and Design

The paper challenges conventional design by adopting an automated strategy that evolves the architecture of normalization and activation components from basic mathematical primitives such as addition, multiplication, and statistical moments. This approach creates a substantial and sparse search space, necessitating advanced methods to navigate effectively. The authors introduce efficient rejection protocols to eliminate inefficacious candidate configurations swiftly and utilize multi-objective evolution to optimize performance across various architectures.

The unified design is modeled as a tensor-to-tensor computation graph, dramatically diverging from mainstream NAS practices that rely on high-level pre-defined modules. By examining interleaved arrangements of normalization and activation functions, some EvoNorms challenge traditional assumptions such as sequential application or centering of feature maps.

Experimental Evaluation

The performance of EvoNorms was validated across multiple domains, including image classification (ResNets, MobileNets, EfficientNets), instance segmentation (Mask R-CNN with FPN/SpineNet), and image synthesis (BigGAN). EvoNorm layers consistently outperformed standard BatchNorm and GroupNorm layers, demonstrating the potential for generalized application across diverse architectures and tasks.

Empirical results in image classification reveal that EvoNorms maintain or exceed the performance of traditional layers across typical architectures. For instance, EvoNorm-B0 shows favorable results, outperforming the benchmark BatchNorm-ReLU configurations under varied training conditions.

Key Findings and Implications

The discovery of EvoNorms not only introduces novel layers with distinct structural properties but also challenges existing heuristics. For example, EvoNorm-B0 incorporates both instance and batch variances, supporting complex normalization processes without explicit activation functions. The EvoNorm-S series provides robust performance without relying on batch statistics, a valuable trait for small-batch applications.

These observations suggest intriguing insights for future design: non-centered normalization schemes, mixed variance utilization, and tensor-to-tensor over scalar-to-scalar activation transitions could be pathways to more effective deep learning models.

Future Prospects

The research elucidates a significant leap in automatic machine learning via the unified search of normalization-activation components. Future development could leverage these findings for more refined NAS protocols, potentially automating the complete model design process. Additionally, the scale-invariant properties exhibited by some EvoNorms might inspire new optimization strategies, improving convergence in deep learning systems.

In conclusion, this work provides a compelling case for re-imagining foundational components of neural networks through the lens of automated design, bolstering both their theoretical understanding and practical efficacy.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com