Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

cuConv: A CUDA Implementation of Convolution for CNN Inference (2103.16234v1)

Published 30 Mar 2021 in cs.DC and cs.LG

Abstract: Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in production for this purpose. State-of-the-art implementations, however, present a lack of efficiency for some commonly used network configurations. In this paper we propose a GPU-based implementation of the convolution operation for CNN inference that favors coalesced accesses, without requiring prior data transformations. Our experiments demonstrate that our proposal yields notable performance improvements in a range of common CNN forward propagation convolution configurations, with speedups of up to 2.29x with respect to the best implementation of convolution in cuDNN, hence covering a relevant region in currently existing approaches.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Pedro Valero-Lara (8 papers)
  2. Antonio J. Peña (12 papers)
  3. Marc Jordà (1 paper)
Citations (10)

Summary

We haven't generated a summary for this paper yet.