Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Accelerator for Dilated and Transposed Convolution with Decomposition (2205.02103v1)

Published 2 May 2022 in cs.AR and cs.LG

Abstract: Hardware acceleration for dilated and transposed convolution enables real time execution of related tasks like segmentation, but current designs are specific for these convolutional types or suffer from complex control for reconfigurable designs. This paper presents a design that decomposes input or weight for dilated and transposed convolutions respectively to skip redundant computations and thus executes efficiently on existing dense CNN hardware as well. The proposed architecture can cut down 87.8\% of the cycle counts to achieve 8.2X speedup over a naive execution for the ENet case.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Kuo-Wei Chang (4 papers)
  2. Tian-Sheuan Chang (33 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.