Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A$^3$PIM: An Automated, Analytic and Accurate Processing-in-Memory Offloader (2402.18592v1)

Published 23 Feb 2024 in cs.AR and cs.PF

Abstract: The performance gap between memory and processor has grown rapidly. Consequently, the energy and wall-clock time costs associated with moving data between the CPU and main memory predominate the overall computational cost. The Processing-in-Memory (PIM) paradigm emerges as a promising architecture that mitigates the need for extensive data movements by strategically positioning computing units proximate to the memory. Despite the abundant efforts devoted to building a robust and highly-available PIM system, identifying PIM-friendly segments of applications poses significant challenges due to the lack of a comprehensive tool to evaluate the intrinsic memory access pattern of the segment. To tackle this challenge, we propose A$3$PIM: an Automated, Analytic and Accurate Processing-in-Memory offloader. We systematically consider the cross-segment data movement and the intrinsic memory access pattern of each code segment via static code analyzer. We evaluate A$3$PIM across a wide range of real-world workloads including GAP and PrIM benchmarks and achieve an average speedup of 2.63x and 4.45x (up to 7.14x and 10.64x) when compared to CPU-only and PIM-only executions, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (7)
  1. Frances E Allen. “Control flow analysis”, in ACM Sigplan Notices, 1970.
  2. ARM. ARM Cortex-A Series Programmer’s Guide for ARMv7-A, URL: https://developer.arm.com/documentation/den0013/d/Caches/Cache-architecture/Virtual-and-physical-tags-and-indexes.
  3. Hybrid Memory Cube Consortium, “HMC Specification 2.0”, 2014.
  4. Mingyu Gao, Grant Ayers, and Christos Kozyrakis. “Practical near-data processing for in-memory analytics frameworks”, in PACT, 2015.
  5. HiSilicon,“Kunpeng 920 Chipset”, https://www.hisilicon.com/en/products /Kunpeng/Huawei-Kunpeng/Huawei-Kunpeng-920, 2021.
  6. Intel,“Intel Architecture Code Analyzer User’s Guide”, https://software. intel.com/content/dam/develop/external/us/en/documents/intel-architecture-code-analyzer-3-0-users-guide-157552.pdf, 2017.
  7. Chris Lattner and Vikram Adve., “LLVM: A compilation framework for lifelong program analysis & transformation”, in CGO, 2004.

Summary

We haven't generated a summary for this paper yet.