Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 86 tok/s
Gemini 2.5 Pro 45 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 111 tok/s Pro
Kimi K2 178 tok/s Pro
GPT OSS 120B 452 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Racing to Idle: Energy Efficiency of Matrix Multiplication on Heterogeneous CPU and GPU Architectures (2507.20063v1)

Published 26 Jul 2025 in cs.DC and cs.CC

Abstract: The paradigm shift towards multi-core and heterogeneous computing, driven by the fundamental power and thermal limits of single-core processors, has established energy efficiency as a first-class design constraint in high-performance computing (HPC). Heterogeneous systems, integrating traditional multi-core CPUs with specialized accelerators like discrete (dGPU) and integrated (iGPU) graphics processing units, offer a compelling path to navigating the trade-offs between performance and power. However, quantifying these trade-offs on widely accessible hardware remains a critical area of study. This paper presents a direct, empirical measurement of the performance and energy-to-solution of a canonical HPC workload -- a 4096x4096 matrix-matrix multiplication -- on three distinct compute architectures within a single consumer-grade laptop: a multi-core AMD Ryzen 7 5800H CPU, a discrete NVIDIA GeForce GTX 1650 GPU, and an integrated AMD Radeon Vega GPU. Using standard, validated, and minimally intrusive tools such as Linux perf and nvidia-smi, we find that the discrete GPU is not only the performance leader, achieving a 93.5x speedup over the CPU, but is also the most energy-efficient, consuming only 2% of the energy used by the CPU, resulting in a 50-fold improvement in energy efficiency. These findings provide a practical demonstration of the "race to idle" principle and offer clear, quantitative guidance on architectural choices for energy-aware software development.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.