Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures (1808.05567v2)

Published 16 Aug 2018 in cs.DC

Abstract: Convolution layers are prevalent in many classes of deep neural networks, including Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like image recognition, neural machine translation and speech recognition. The computationally expensive nature of a convolution operation has led to the proliferation of implementations including matrix-matrix multiplication formulation, and direct convolution primarily targeting GPUs. In this paper, we introduce direct convolution kernels for x86 architectures, in particular for Xeon and XeonPhi systems, which are implemented via a dynamic compilation approach. Our JIT-based implementation shows close to theoretical peak performance, depending on the setting and the CPU architecture at hand. We additionally demonstrate how these JIT-optimized kernels can be integrated into a lightweight multi-node graph execution model. This illustrates that single- and multi-node runs yield high efficiencies and high image-throughputs when executing state-of-the-art image recognition tasks on CPUs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Evangelos Georganas (18 papers)
  2. Sasikanth Avancha (20 papers)
  3. Kunal Banerjee (12 papers)
  4. Dhiraj Kalamkar (15 papers)
  5. Greg Henry (7 papers)
  6. Hans Pabst (10 papers)
  7. Alexander Heinecke (21 papers)
Citations (100)

Summary

We haven't generated a summary for this paper yet.