Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 81 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 32 tok/s Pro

GPT-5 High 32 tok/s Pro

GPT-4o 99 tok/s Pro

Kimi K2 195 tok/s Pro

GPT OSS 120B 462 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net (1807.09441v3)

Published 25 Jul 2018 in cs.CV

Abstract: Convolutional neural networks (CNNs) have achieved great successes in many computer vision problems. Unlike existing works that designed CNN architectures to improve performance on a single task of a single domain and not generalizable, we present IBN-Net, a novel convolutional architecture, which remarkably enhances a CNN's modeling ability on one domain (e.g. Cityscapes) as well as its generalization capacity on another domain (e.g. GTA5) without finetuning. IBN-Net carefully integrates Instance Normalization (IN) and Batch Normalization (BN) as building blocks, and can be wrapped into many advanced deep networks to improve their performances. This work has three key contributions. (1) By delving into IN and BN, we disclose that IN learns features that are invariant to appearance changes, such as colors, styles, and virtuality/reality, while BN is essential for preserving content related information. (2) IBN-Net can be applied to many advanced deep architectures, such as DenseNet, ResNet, ResNeXt, and SENet, and consistently improve their performance without increasing computational cost. (3) When applying the trained networks to new domains, e.g. from GTA5 to Cityscapes, IBN-Net achieves comparable improvements as domain adaptation methods, even without using data from the target domain. With IBN-Net, we won the 1st place on the WAD 2018 Challenge Drivable Area track, with an mIoU of 86.18%.

Citations (696)

View on Semantic Scholar

Summary

The paper introduces IBN-Net, a novel architecture combining instance and batch normalization to enhance both learning and generalization in CNNs.
It strategically applies instance normalization in shallow layers and batch normalization in deeper layers to improve feature discrimination and robustness against appearance variations.
Empirical results show IBN-Net achieves lower error rates and superior cross-domain performance, outperforming traditional CNNs and domain adaptation methods.

Enhancing CNNs with IBN-Net: A Comprehensive Review

The paper, "Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net," introduces IBN-Net, a novel deep learning architecture that combines Instance Normalization (IN) and Batch Normalization (BN) to enhance both learning and generalization capacities of Convolutional Neural Networks (CNNs). The work originates from CUHK-SenseTime Joint Lab and presents a substantial contribution to addressing the challenge of domain shift in computer vision tasks.

Core Contributions

The paper is structured around three principal contributions:

Normalization Insight: It analyses the distinct roles of IN and BN. IN is effective in achieving robustness against appearance changes by eliminating variance related to color, style, and reality/virtuality, while BN is crucial for preserving discriminative content information.
Architecture Agnosticism: IBN-Net is designed to be integrable with several advanced network architectures like DenseNet, ResNet, ResNeXt, and SENet, enhancing their performance without additional computational costs.
Empirical Performance: IBN-Net significantly improves generalization across domains. For example, it outperforms domain adaptation techniques by generalizing from synthetic to real domains without using target domain data.

Methodology and Results

The paper's methodology involves integrating IN and BN in CNNs intelligently. IN layers are added primarily in shallow network layers to address appearance variations without compromising deeper layer functionality, which retains critical content-related features. This strategic integration demonstrates improvement in feature discrimination while reducing the risks of overfitting.

Numeric Performance: IBN-Net50 achieves top-1/top-5 error rates of 22.54%/6.32% on original ImageNet validation sets, outperforming ResNet50 by a margin. The network's robustness is evident when tested under various transformations and style diversifications, showing much lower performance degradation compared to baseline models.
Domain Generalization: IBN-Net provides notable improvements in cross-domain evaluations, such as training on Cityscapes and testing on GTA5. The architecture enhances mIoU scores significantly compared to its ResNet counterparts.

Implications and Future Directions

The implications of this research lie in the ability to deploy CNNs in real-world applications where data variability, such as lighting and styles, can pose a challenge. IBN-Net's approach offers a practical solution for applications requiring robust domain adaptation.

Future directions could explore further customization of normalization strategies across various application-specific architectures. Developing adaptive mechanisms that dynamically apply IN and BN based on input data characteristics could refine generalization capabilities.

Concluding Remarks

IBN-Net represents a significant advancement in bridging the performance gap induced by domain shifts. Its robust design integrating IN and BN elevates both learning and generalization capacities, proving its utility across a range of complex vision tasks. This methodology not only enhances existing architectures but also sets a foundation for future research in domain-agnostic learning improvements.