Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Modern Primer on Processing in Memory (2012.03112v3)

Published 5 Dec 2020 in cs.AR and cs.DC

Abstract: Modern computing systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in computing that cause performance, scalability and energy bottlenecks: (1) data access is a key bottleneck as many important applications are increasingly data-intensive, and memory bandwidth and energy do not scale well, (2) energy consumption is a key limiter in almost all computing platforms, especially server and mobile systems, (3) data movement, especially off-chip to on-chip, is very expensive in terms of bandwidth, energy and latency, much more so than computation. These trends are especially severely-felt in the data-intensive server and energy-constrained mobile systems of today. At the same time, conventional memory technology is facing many technology scaling challenges in terms of reliability, energy, and performance. As a result, memory system architects are open to organizing memory in different ways and making it more intelligent, at the expense of higher cost. The emergence of 3D-stacked memory plus logic, the adoption of error correcting codes inside the latest DRAM chips, proliferation of different main memory standards and chips, specialized for different purposes (e.g., graphics, low-power, high bandwidth, low latency), and the necessity of designing new solutions to serious reliability and security issues, such as the RowHammer phenomenon, are an evidence of this trend. This chapter discusses recent research that aims to practically enable computation close to data, an approach we call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked memory, or in the memory controllers), so that data movement between the computation units and memory is reduced or eliminated.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Onur Mutlu (279 papers)
  2. Saugata Ghose (59 papers)
  3. Juan Gómez-Luna (57 papers)
  4. Rachata Ausavarungnirun (27 papers)
Citations (163)

Summary

  • The paper introduces Processing-in-Memory (PIM) as a novel approach to reduce data movement and enhance energy efficiency in computing systems.
  • It details two main PIM implementations—Processing Using Memory (PUM) and Processing Near Memory (PNM)—with specific examples like RowClone, Ambit, and Tesseract.
  • The study addresses key challenges including programming models, runtime systems, and security, setting the stage for future adoption of PIM technologies.

A Modern Primer on Processing in Memory

The paper "A Modern Primer on Processing-in-Memory" authored by Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, and Rachata Ausavarungnirun presents a comprehensive overview of Processing-in-Memory (PIM) architectures. With the current trends in computing, where data access bottlenecks are prevalent, energy efficiency is paramount, and data movement costs are significant, the paper emphasizes PIM as a crucial evolution in memory system design.

Key Trends Impacting Modern Computing

The authors identify three principal trends driving the need for PIM:

  1. Data Access Bottlenecks: As applications become increasingly data-intensive, traditional memory bandwidth and energy constraints hinder performance scalability.
  2. Energy Consumption: High energy usage is a critical constraint, particularly in server and mobile systems.
  3. Data Movement Costs: Data transfer, especially off-chip to on-chip, incurs substantial bandwidth, energy, and latency overheads compared to computation.

These trends are exacerbated by the scaling challenges faced by conventional memory technologies like DRAM, where reliability, performance, and energy efficiency are declining with smaller process nodes. The adoption of intelligent memory system designs, such as 3D-stacked memory and new standards (e.g., low-power, high-bandwidth memory), is a response to these challenges.

Processing-in-Memory (PIM) Approaches

The authors introduce PIM as an architectural solution to mitigate the issues of data movement by bringing computation closer to where data is stored. PIM can be realized in two primary forms:

  1. Processing Using Memory (PUM): This approach leverages the intrinsic capabilities of memory cells to perform computational operations with minimal changes to existing memory technologies. Examples include RowClone and Ambit, where data copy, initialization, and bitwise operations are performed in-memory.
  2. Processing Near Memory (PNM): Utilizes the logic layer in 3D-stacked memory technologies to integrate more complex computation units (e.g., CPUs, accelerators) in close proximity to the memory layers, facilitating high bandwidth and low latency access.

Processing Using Memory: RowClone and Ambit

  • RowClone: It enables efficient in-memory bulk data movement operations like copying and initialization by exploiting DRAM row buffer mechanics. This mechanism can reduce latency and energy consumption significantly when performing large-scale data operations.
  • Ambit: Implements bulk bitwise operations within DRAM by utilizing triple-row activation and exploiting DRAM's analog operational behaviors. This allows for efficient execution of bitwise operations critical for applications like databases and encryption.

Processing Near Memory: Architectures and Applications

  • Tesseract: A graph processing framework that places simple cores in the logic layer of 3D-stacked memory to leverage high internal memory bandwidth, thereby improving performance and energy efficiency for graph analytics.
  • PEI (PIM-Enabled Instructions): These instructions can be executed either by the CPU or in-memory processing units, maintaining cache coherence and programmability while offloading suitable computations to memory.

Adoption Challenges and Future Work

The paper also discusses the systemic barriers to PIM adoption:

  • Programming Models: New paradigms and tools are needed to facilitate programming PIM systems.
  • Runtime Systems: Efficient scheduling, data mapping, and memory coherence mechanisms are vital.
  • Security: Ensuring secure computation within PIM environments is equally critical.

The authors suggest that continued research in these areas, along with robust benchmarks and simulation infrastructures, will drive the mainstream adoption of PIM. They also highlight the recent interest and developments in the industry, underscoring the practicality and imminent realization of PIM technologies.

Implications and Future Directions

PIM has significant theoretical and practical implications. By drastically reducing data movement, it promises exponential improvements in energy efficiency and performance, potentially transforming applications ranging from artificial intelligence to data analytics. Future research should focus on improving PIM integration within existing ecosystems, creating standardized benchmarks, exploring novel security mechanisms, and developing advanced memory technologies capable of supporting complex computations in-memory. As these advancements materialize, PIM could very well redefine the landscape of modern computing architectures.

Youtube Logo Streamline Icon: https://streamlinehq.com