Convolutional Networks with Dense Connectivity (2001.02394v1)

Published 8 Jan 2020 in cs.LG, cs.CV, and stat.ML

Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion.Whereas traditional convolutional networks with L layers have L connections - one between each layer and its subsequent layer - our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, encourage feature reuse and substantially improve parameter efficiency. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less parameters and computation to achieve high performance.

PDF Abstract

Overview of "Convolutional Networks with Dense Connectivity"

The paper "Convolutional Networks with Dense Connectivity" introduces the Dense Convolutional Network (DenseNet), a novel architecture that significantly advances the design of deep convolutional neural networks (CNNs). This architecture is characterized by its dense connectivity pattern, where each layer is connected to every other layer in a feed-forward fashion. This approach results in a number of connections in a network that grows quadratically with depth, specifically $\frac{L(L+1)}{2}$ for $L$ layers, as opposed to the linear growth found in traditional architectures, such as ResNets, which only connect each layer to its successor.

DenseNet's core innovation lies in its capacity to ensure maximum information flow and gradient passage between layers by leveraging direct access to preceding feature maps. This facilitates improved learning dynamics, reducing issues such as the vanishing-gradient problem and enabling better feature reuse. Each layer can use information from all preceding layers, promoting the integration of diverse spatial features across the network and leading to more compact models that require fewer parameters and computational resources to achieve high performance.

The DenseNet architecture was rigorously evaluated on several competitive object recognition benchmarks, including CIFAR-10, CIFAR-100, SVHN, and ImageNet. Across these tasks, DenseNets consistently demonstrated enhanced parameter efficiency and yielded notable improvements over the state-of-the-art performance. For instance, DenseNet architectures achieved significant error rate reductions with a fraction of the parameters necessary in ResNet architectures.

Key Findings and Numerical Results

DenseNets alleviate the vanishing-gradient problem and enhance gradient flow by connecting each layer to every other preceding layer. This implicit form of deep supervision aids in training deeper networks effectively.
On the CIFAR-10 and CIFAR-100 datasets, DenseNet achieved significant improvements over the state-of-the-art, such as reducing the error rates to 3.46% and 17.18%, respectively, with configurations using fewer parameters than equivalent ResNet designs.
DenseNets exhibited improved computational efficiency, necessitating fewer FLOPs than competing architectures while maintaining or enhancing accuracy, as seen in ImageNet evaluations.

Implications and Future Directions

DenseNet represents a major step forward in the architecture of deep networks by optimizing parameter usage, boosting feature reuse, and improving gradient flow. These features make DenseNets particularly well-suited for tasks with limited training data, reducing overfitting potential through implicit deep supervision and enhanced regularization effects from dense connections.

Practically, the DenseNet architecture can be particularly beneficial in resource-constrained environments where computational power and memory are limited, while still demanding high-performance machine learning models. The substantial reduction in required parameters and computation also indicates that DenseNets could serve as a foundation for more efficient architectures deployed in real-world applications, ranging from mobile devices to large-scale data centers.

Theoretically, DenseNets prompt further exploration into the balance of depth, width, and connectivity in neural networks. Future research might investigate the trade-offs in network connectivity strategies, potential integration of hybrid features with other architectural innovations, and the exploration of memory-efficient implementations in varied frameworks.

DenseNet not only presents a robust approach to improving the efficacy of CNNs but also paves the way for new architectural innovations in deep learning, offering a compelling template for the efficient use of network resources and expanded network depth.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Gao Huang (178 papers)
Zhuang Liu (63 papers)
Geoff Pleiss (41 papers)
Laurens van der Maaten (54 papers)
Kilian Q. Weinberger (105 papers)

Citations (402)

View on Semantic Scholar

Convolutional Networks with Dense Connectivity (2001.02394v1)

Overview of "Convolutional Networks with Dense Connectivity"

Key Findings and Numerical Results

Implications and Future Directions

Related Papers