Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 167 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Dataflow and Analytical Modelling (2408.01254v2)

Published 2 Aug 2024 in cs.AI and cs.AR

Abstract: In order to follow the ever-growing computational complexity and data intensity of state-of-the-art AI models, new computing paradigms are being proposed. These paradigms aim at achieving high energy efficiency, by mitigating the Von Neumann bottleneck that relates to the energy cost of moving data between the processing cores and the memory. Convolutional Neural Networks (CNNs) are susceptible to this bottleneck, given the massive data they have to manage. Systolic Arrays (SAs) are promising architectures to mitigate the data transmission cost, thanks to high data utilization of Processing Elements (PEs). These PEs continuously exchange and process data locally based on specific dataflows (like weight stationary and row stationary), in turn reducing the number of memory accesses to the main memory. In SAs, convolutions are managed either as matrix multiplications or exploiting the raster-order scan of sliding windows. However, data redundancy is a primary concern affecting area, power and energy. In this paper, we propose TrIM: a novel dataflow for SAs based on a Triangular Input Movement and compatible with CNN computing. TrIM maximizes the local input utilization, minimizes the weight data movement and solves the data redundancy problem. Furthermore, TrIM does not incur the significant on-chip memory penalty introduced by the row stationary dataflow. When compared to state-of-the-art SA dataflows the high data utilization offered by TrIM guarantees ~10x less memory access. Furthermore, considering that PEs continuously overlap multiplications and accumulations, TrIM achieves high throughput (up to 81.8% higher than row stationary), other than requiring a limited number of registers (up to 15.6x fewer registers than row stationary).

Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 4 likes.

Upgrade to Pro to view all of the tweets about this paper: