Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 95 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 25 tok/s Pro

GPT-5 High 18 tok/s Pro

GPT-4o 95 tok/s Pro

GPT OSS 120B 391 tok/s Pro

Kimi K2 159 tok/s Pro

2000 character limit reached

Benchmark-based Study of CPU/GPU Power-Related Features through JAX and TensorFlow (2505.03398v1)

Published 6 May 2025 in cs.PF

Abstract: Power management has become a crucial focus in the modern computing landscape, considering that {\em energy} is increasingly recognized as a critical resource. This increased the importance of all topics related to {\em energy-aware computing}. This paper presents an experimental study of three prevalent power management techniques that are {\em power limitation, frequency limitation}, and {\em ACPI/P-State governor modes} (OS states related to power consumption). Through a benchmark approach with a set of six computing kernels, we investigate {\em power/performance} trade-off with various hardware units and software frameworks (mainly TensorFlow and JAX). Our experimental results show that {\em frequency limitation} is the most effective technique to improve {\em Energy-Delay Product (EDP)}, which is a convolution of energy and running time. We also observe that running at the highest frequency compared to a reduced one could lead to a reduction of factor $\frac{1}{10}$ in EDP. Another noticeable fact is that frequency management shows a consistent behavior with different CPUs, whereas opposite effects sometimes occur between TensorFlow (TF) and JAX with the same power management settings.

Collections

Summary

The paper demonstrates that dynamic voltage and frequency scaling (DVFS) can achieve up to a 10% improvement in energy-delay product (EDP) compared to baseline settings.
It employs a benchmark-driven approach using diverse computing kernels across Intel, AMD, and Nvidia platforms to assess power capping, DVFS, and ACPI/P-State modes.
The study reveals that JAX and TensorFlow exhibit distinct energy efficiencies, emphasizing the need for platform-specific tuning and workload characterization in energy-aware computing.

This paper (2505.03398) presents an empirical paper on the impact of power management techniques on modern CPU and GPU architectures using the JAX and TensorFlow frameworks. The paper focuses on three prevalent power management techniques: power limitation (capping), frequency limitation (DVFS), and ACPI/P-State governor modes. Through a benchmark-driven approach, the paper analyzes the power/performance trade-offs across different hardware units and software frameworks, providing insights into energy-aware computing.

Experimental Setup and Benchmarks

The experimental setup includes Intel Xeon Platinum 8358, AMD EPYC 7513, and NVIDIA A100 SXM4 platforms. The evaluation employs six computing kernels: GEMM, Stencil, SpMV, Triad, Dist, and Monte Carlo, representing a mix of compute-bound, memory-bound, and mixed-workloads. The authors used the EA2P tool to measure energy consumption. JAX and TensorFlow were configured with specific settings to optimize memory allocation and performance. For example, the JAX benchmarks used specific XLA environment variables (XLA_PYTHON_CLIENT_PREALLOCATE=false, XLA_PYTHON_CLIENT_MEM_FRACTION=.10, XLA_PYTHON_CLIENT_ALLOCATOR=platform) to mitigate OOM errors. Similarly, TensorFlow employed TF_GPU_ALLOCATOR = 'cuda_malloc_async' to accelerate memory operations.

Impact of Power Management Techniques on EDP

The paper evaluates the impact of DVFS, ACPI governors, and power capping on execution time, power consumption, energy consumption, and EDP. The results indicate that frequency limitation is often the most effective technique for improving EDP. Running at the highest frequency compared to a reduced one could lead to a reduction of up to $\frac{1}{10}$ in EDP. Furthermore, the paper observes that frequency management shows consistent behavior across different CPUs, while TensorFlow and JAX exhibit opposite effects under the same power management settings.

Figure 1: EDP vs Time on Intel for JAX.

Figure 2: EDP vs Time on Intel for TF.

Figure 3: EDP vs Time on AMD for JAX.

Figure 4: EDP vs Time on AMD for TF.

Figure 5: EDP vs Time on Nvidia for JAX.

Figure 6: EDP vs Time on Nvidia for TF.

The plots of EDP vs time (Figures 1, 2, 3, 4, 5, and 6) quantify the efficiency trade-offs achievable with different energy management techniques. Tables 2, 3, 4, 5, 6, and 7 summarize the optimal EDP observed for each instance of ({Platform, Framework, Benchmark, ManagementMethod), presenting EDP reduction and the corresponding running time gap relative to the baseline. The plots confirm that some configurations out of the baseline ones yield a significant reduction of the EDP, each management technique having a specific impact.

Platform-Specific Observations

The paper reveals platform-specific trends in power management:

Intel: Power capping frequently yields EDP reductions with slightly lower running times. DVFS also lowers EDP, but the optimal EDP often comes with performance degradation.
AMD: DVFS consistently yields the lowest EDP values, often at the expense of increased running time. ACPI governors, particularly 'powersave', reduce EDP but increase running time.
Nvidia: Power capping is highly effective for EDP reduction on A100. DVFS reduces EDP but seems limited by JAX's inability to run at the lowest frequencies.

Power Consumption Analysis

The power analysis at the component level reveals that DVFS and power capping primarily modulate the overall CPU/GPU power consumption. JAX and TF show different energy efficiencies with similar power draw values and different running times.

Figure 7: Power trends on Intel.

Figure 8: Power trends on AMD.

Figure 9: Power trends on Nvidia.

Figures 7, 8, and 9 display the average power draw of the primary component (CPU Package 'Pkg' for Intel/AMD, 'GPU' for Nvidia) and a secondary component ('DRAM' for CPUs, 'Host' CPU for Nvidia) with our considered management settings. The figures show that applying DVFS consistently reduces the power consumption of the main compute component. Similarly, power capping directly limits the power of the primary compute component. The choice of governor does not significantly influence the DRAM power draw.

Framework Efficiency Trends

The paper also explores framework efficiency trends. Considering Triad on A100, power capping yields better EDP reduction for JAX (42%) than for TF (24%), with a significant improvement over DVFS for both. The paper suggests that the frameworks differ considerably in their runtime efficiency for specific tasks, potentially due to compilation strategies, kernel implementations, runtime overheads, and memory management. Notably, JAX was significantly faster on Stencil than TF despite memory pressure. This strongly indicates that XLA's fusion for stencil computations was highly effective at reducing the total number of memory accesses and arithmetic operations.

Conclusions

The paper concludes that the effectiveness of energy management techniques is highly context-dependent, requiring a platform/workload-aware approach. It emphasizes the importance of characterizing workload types and considering framework robustness when optimizing energy efficiency. For practitioners, the paper recommends platform-specific tuning, workload characterization, and benchmarking. For framework developers, it suggests ensuring robustness across operational settings and integrating energy-aware runtime/compilation strategies. For hardware designers, it reinforces the importance of accurate power monitoring features and effective control mechanisms.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (4)

Tweets

https://twitter.com/HPCPapers/status/1941558384786071761

Benchmark-based Study of CPU/GPU Power-Related Features through JAX and TensorFlow (2505.03398v1)

Collections

Summary