Efficiently Enhancing Long-Range Dependency in CNNs

Develop efficient methods to enhance long-range dependency modeling in convolutional neural networks, addressing their inherent locality while maintaining practical computational complexity.

Background

The paper highlights that convolutional neural networks excel at local feature extraction due to their inherent locality, but they struggle to model long-range dependencies effectively. In contrast, Transformers capture global context but incur quadratic complexity in sequence length, which becomes prohibitive for high-resolution biomedical images.

Motivated by these limitations, the authors propose U-Mamba, which integrates state space sequence models (via Mamba) into a U-Net-like encoder-decoder architecture to better capture long-range dependencies with linear scaling. The explicit open question underscores the broader research need for fundamentally efficient strategies to endow CNNs with strong long-range reasoning without the computational burden typical of self-attention.

References

Thus, how to efficiently enhance the long-range dependency in CNNs remains an open question.

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation (2401.04722 - Ma et al., 9 Jan 2024) in Section 1 (Introduction)