U$^2$-Net: Going Deeper with Nested U-Structure for Salient Object Detection (2005.09007v3)

Published 18 May 2020 in cs.CV

Abstract: In this paper, we design a simple yet powerful deep network architecture, U$^2$-Net, for salient object detection (SOD). The architecture of our U$^2$-Net is a two-level nested U-structure. The design has the following advantages: (1) it is able to capture more contextual information from different scales thanks to the mixture of receptive fields of different sizes in our proposed ReSidual U-blocks (RSU), (2) it increases the depth of the whole architecture without significantly increasing the computational cost because of the pooling operations used in these RSU blocks. This architecture enables us to train a deep network from scratch without using backbones from image classification tasks. We instantiate two models of the proposed architecture, U$^2$-Net (176.3 MB, 30 FPS on GTX 1080Ti GPU) and U$^{2$-Net$^{\dagger}$} (4.7 MB, 40 FPS), to facilitate the usage in different environments. Both models achieve competitive performance on six SOD datasets. The code is available: https://github.com/NathanUA/U-2-Net.

Citations (1,408)

View on Semantic Scholar

Summary

The paper presents a novel two-level nested U-structure that captures both local and global features for improved salient object detection.
It leverages ReSidual U-blocks to combine residual learning with U-Net architectures, preserving high-resolution details for multi-scale feature extraction.
Experimental results on six benchmark datasets validate both full and compact variants, demonstrating competitive performance and real-time capabilities.

U $^2$ -Net: A Comprehensive Summary

Overview

The paper presents U $^2$ -Net, an innovative deep network architecture designed specifically for Salient Object Detection (SOD). It introduces a two-level nested U-structure intended to capture both local and global information at various scales effectively. The proposed network consists of two variants — a full size model and a smaller model, optimized for different computational environments. Both models outperform many state-of-the-art (SOTA) networks in terms of performance on prominent datasets.

Technical Contributions

Two-Level Nested U-Structure: The architectural foundation of U $^2$ -Net is its nested U-structure. At the core, the U $^2$ -Net integrates ReSidual U-blocks (RSUs), which enhance multi-scale feature extraction without sacrificing high-resolution details. The nested U-structure is composed of:

Encoder: Responsible for progressively reducing the spatial dimensions while capturing high-level contextual features.
Decoder: Designed to progressively reconstruct the spatial dimensions while preserving essential feature details.
Fusion Module: Combines outputs from various layers to generate the final saliency maps.

ReSidual U-Blocks (RSU): RSUs merge the principles of residual learning with U-Net architectures, enabling efficient extraction of multi-scale features. Each RSU block extracts features both locally (fine-grained details) and globally (contextual information), preserving high-resolution details by avoiding excessive down-sampling early in the network layers.

Experimental Results

Model Performance: The paper evaluates U $^2$ -Net on six benchmark datasets: DUT-OMRON, DUTS-TE, HKU-IS, ECSSD, PASCAL-S, and SOD. Metrics used include maximal F-measure ( $\text{max}F_\beta$ ), Mean Absolute Error (MAE), weighted F-measure ( $F^w_\beta$ ), structure measure ( $S_m$ ), and relaxed boundary F-measure ( $\text{relax}F^b_\beta$ ). Across these datasets, U $^2$ -Net achieves exceptional results, often surpassing existing SOTA methods.

Variants and Efficiency: The authors introduce two variants of U $^2$ -Net:

U $^2$ -Net (Full): A larger model with a deeper architecture, capturing detailed multi-scale features efficiently (176.3 MB, 30 FPS).
U $^2$ -Net $^\dagger$ (Compact): A smaller model optimized for resource-constrained environments (4.7 MB, 40 FPS), still achieving competitive performance.

Implications and Future Work

Practical Applications: The U $^2$ -Net architecture’s capacity to effectively capture and integrate multi-scale features without relying on pre-trained backbones makes it particularly appealing for applications where training from scratch is preferable or necessary. Additionally, the compact variant's real-time performance on limited hardware underlines its potential for integration into mobile and embedded systems.

Theoretical Insights: The two-level U-structure and RSUs are theoretically significant, offering a new perspective on enhancing U-Net architectures. The work shows how hierarchical nested structures can be leveraged to balance depth and computational efficiency, addressing a common trade-off in deep learning.

Future Directions: Potential future developments include:

Further Model Optimization: Exploring techniques to further reduce model size and improve inference speed without significant performance loss.
Larger and Diverse Datasets: Utilizing more extensive and diverse datasets to enhance the model's robustness and generalization capabilities across varied real-world scenarios.
Broader Applications: Investigating the application of the U $^2$ -Net architecture beyond SOD to other computer vision tasks such as instance segmentation, scene parsing, and medical image analysis.

Conclusion

The U $^2$ -Net paper introduces a robust architecture for SOD, demonstrating that high-resolution feature extraction and multi-scale context integration can be achieved through an innovative nested U-structure. With its impressive benchmark performance and practical implications, U $^2$ -Net signifies a notable contribution to the field of SOD and provides a solid foundation for future architectural enhancements in deep learning.

PDF Markdown

Related Papers

GitHub

GitHub - xuebinqin/U-2-Net: The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection." (8,230 stars)

YouTube

Show All Videos