Papers
Topics
Authors
Recent
Search
2000 character limit reached

Power Measurement Toolkit (PMT)

Updated 12 January 2026
  • The Power Measurement Toolkit (PMT) is a comprehensive system for analyzing energy consumption in diverse computing environments with precision and integration capabilities.
  • Designed for high-performance, PMT supports multiple hardware backends and APIs, offering low-overhead solutions for energy-aware applications across HPC, embedded systems, and data centers.
  • PMT underpins key advancements in green computing, providing tools for researchers to accurately measure, model, and optimize system performance in terms of energy usage.

The Power Measurement Toolkit (PMT) is a class of scientific libraries, device APIs, and modeling protocols for collecting, modeling, and analyzing energy consumption in heterogeneous computing platforms. PMT systems provide precise measurement, time-resolved logging, and direct integration pathways for energy-aware applications, notably in high-performance computing (HPC), embedded systems, and data center environments. PMT abstracts heterogeneous sensor backends, supports both hardware-native and PMC-based power estimation, offers low-overhead operation suitable for live application instrumentation, and has substantially enabled green computing research and workload optimization (Corda et al., 2022, Simsek et al., 2023, Mazzola et al., 30 Jun 2025, Mazzola et al., 2024).

1. Toolkit Architectures and Software Layers

PMT implementations adopt a layered architecture; core software is predominantly written in C++ or other low-level languages and runs on Linux. The key layers are:

  • Application Layer: User-facing measurement hooks, callable via C++ API or Python bindings (decorator or session-based).
  • User API: Abstract base class (pmt::pmt), supporting creation for multiple backends such as NVML (NVIDIA GPUs), RAPL (Intel/AMD CPUs), ROCm SMI (AMD GPUs), and external hardware sensors (e.g., PowerSensor2).
  • Sampling Engine: Background thread per monitored device, orchestrating periodic sensor reads.
  • Vendor/PMC Backends: Direct calls to hardware APIs (MSR for RAPL, NVML, ROCm SMI) or synthetic models via performance counter sampling (PMCs).
  • Hardware Support: Underlying SoC, CPU, GPU, accelerators, power rails/counters, with extendability to FPGAs, sysfs sources (Corda et al., 2022, Mazzola et al., 30 Jun 2025, Mazzola et al., 2024).

In recent extensions, PMT systems can be fully in-kernel (Runmeter LKM), operate at context switch or tick-level granularity, enable moving-window aggregation of PMCs, and evaluate real-time linear models in kernel space (fixed-point arithmetic) (Mazzola et al., 30 Jun 2025, Mazzola et al., 2024).

2. Backend Interfaces, Supported Hardware, and Integration

PMT discovers available hardware via its Hardware Abstraction Layer (HAL) and instantiates device-specific drivers. Major supported backends—with corresponding sampling rates and characteristics—are summarized below.

Device Type Interface / API Lowest Sampling Period Notes
CPU RAPL 500 ms Package+DRAM domains
CPU LIKWID ~100 ms Fallback if RAPL unavailable
CPU sysfs,/class User-settable ARM/Odroid, other SoCs
GPU NVML 10 ms Device power only
GPU ROCm SMI 10 ms AMD Radeon/Instinct
Ext. meter PowerSensor2 1 ms ±1% accuracy
Ext. meter PowerSensor3 50 µs PCIe/USB/SOC/FPGA/SSD

PMT supports multi-device enumeration, thread-safe sampling across MPI ranks, per-function measurement session management, customizable sampling frequencies, and direct logging to CSV/JSON/binary formats (Corda et al., 2022, Simsek et al., 2023, Vlugt et al., 24 Apr 2025).

3. Measurement Methodology and Mathematical Models

PMT collects instantaneous power readings P(t)P(t) at discrete timepoints tit_i and integrates to compute energy:

  • Continuous:

E=∫t0t1P(t)dtE = \int_{t_0}^{t_1} P(t) dt

  • Discrete (Riemann sum):

E≈∑i=1NPiΔti,Δti=ti−ti−1E \approx \sum_{i=1}^N P_i \Delta t_i,\quad \Delta t_i = t_i - t_{i-1}

  • Cumulative Counter (where available):

E=Ccum(t1)−Ccum(t0)E = C_\text{cum}(t_1) - C_\text{cum}(t_0)

  • Average Power:

Pˉ=Et1−t0\bar{P} = \frac{E}{t_1 - t_0}

For PMC-based PMT (modern toolkit variants), offline profiling selects a subset of PMCs Xd,fX_{d, f} with highest linear correlation to measured power at each DVFS state, then trains a non-negative linear model:

  • Per-subsystem power:

Pd(Xd,f,f)=Ld(f)+∑i=1∣Xd,f∣wd,f,i⋅xiTP_d(X_{d, f}, f) = L_d(f) + \sum_{i=1}^{|X_{d, f}|} w_{d, f, i} \cdot \frac{x_i}{T}

  • Full-system:

Ptot(f)=∑d∈D∗Pd(Xd,fd,fd)P_{tot}(f) = \sum_{d \in D^*} P_d(X_{d, f_d}, f_d)

Calibration—via external meters or startup offset estimation—is used to correct sensor drift or systematic offset. Robustness to stochastic jitter is provided by windowed aggregation (Mazzola et al., 30 Jun 2025, Mazzola et al., 2024).

4. APIs for Measurement, Logging, and Analysis

PMT exposes instrumentation points suitable for both static region-averaged measurement and continuous time-series logging.

  • C++ API
    1
    2
    3
    4
    5
    6
    
    auto sensor = pmt::nvml::NVML::create();
    auto S = sensor->read();
    // ... measured region ...
    auto E = sensor->read();
    std::cout << "Energy [J]: " << sensor->joules(S,E) << "\n";
    std::cout << "Power [W]: "  << sensor->watts(S,E)  << "\n";
  • Python API
    1
    2
    3
    4
    5
    6
    7
    
    import pmt
    @pmt.measure("nvml")
    @pmt.measure("rapl")
    def work():
        time.sleep(5)
    results = work()
    print(results)
  • Session-based measurement (multi-devices)
    1
    2
    3
    4
    5
    6
    7
    
    pmt_init();
    int sess = pmt_create_session("mykernel");
    pmt_register_devices(sess, {cpu_dev, gpu_dev});
    pmt_start(sess); /* ... */
    pmt_stop(sess);
    pmt_export(sess, "energy_log.csv", PMT_FORMAT_CSV);
    pmt_finalize();
  • Continuous logging (CSV/JSON):

1
2
rank, session, func, device, id, timestamp_ns, power_W, energy_J
0,SPH-EXA,MomentumEnergy,GPU,0,1673478912345678,210.5,15.875

5. Performance, Accuracy, and Overhead Analysis

PMT overhead is dictated by operating mode, backend latency, and measurement granularity. Key performance metrics:

Mode Typical Overhead Accuracy Minimum ΔE granularity
C++ measure-mode ~1 ms/region 5–10% vs ext. meter NVML @10 ms: ∼2 J, RAPL @500 ms: ∼25 J
Python decorator ~10 ms/call Similar Stack of backends increases linearly
In-kernel Runmeter 0.2–0.7% CPU 7.5% Power MAPE Sub-ms responsive, ~1.3% energy error
PowerSensor2/3 <1% ±1% (PS2), ±2–4 W (PS3) 1 ms (PS2), 50 µs (PS3)

Reported errors: GPU kernel measurement on TITAN RTX: PMT/PowerSensor2 within 3% systematic offset; overall energy error across CPU+GPU below 1.3% in PMC model deployments (Corda et al., 2022, Simsek et al., 2023, Mazzola et al., 30 Jun 2025, Vlugt et al., 24 Apr 2025, Mazzola et al., 2024).

6. Guidelines for Deployment and Best Practices

  • Measurement mode selection: Use region-bracketing for workload-average metrics; dump-mode for time-series and event correlation.
  • Sampling configuration: Set periods to backend limits (NVML: 10 ms; RAPL: 500 ms; PowerSensor3: as low as 50 µs), balancing resolution against system load.
  • API usage: Minimize sensor instance re-creation; stack Python decorators judiciously.
  • Calibration: Periodically benchmark against reference hardware meter; external meter validation to detect drift.
  • Energy-aware analysis: Compute EDP (E×TE \times T), per-watt performance (GFLOP/s/W), function-level breakdowns for optimization.
  • Cross-platform extension: Add backend by subclassing pmt::pmt, implementing read(), and registering with factory creation.
  • In-kernel integration: Deploy fixed-point PMC models for closed-loop DVFS, per-task scheduling, and power capping as in Runmeter (Corda et al., 2022, Mazzola et al., 2024, Mazzola et al., 30 Jun 2025).

7. Impact, Limitations, and Application Domains

PMT has shifted energy profiling from coarse system-level accounting to fine-grained, multi-device, per-kernel analysis, directly supporting exascale simulation codes, embedded system prototyping, and real-time scheduling under energy constraints. Example studies include SPH-EXA framework instrumented for GPU-centric astrophysics, data-driven PMC modeling for real-time DVFS, and Kernel Tuner applications for beamformer energy/performance tradeoffs (Simsek et al., 2023, Mazzola et al., 30 Jun 2025, Vlugt et al., 24 Apr 2025).

Limitations reside in backend API resolution, hardware counter compatibility, sensor calibration drift, and the assumption of subsystem power independence—affecting accuracy in tightly interacting CPU/GPU phases (Mazzola et al., 30 Jun 2025, Mazzola et al., 2024). A plausible implication is the need for periodic retraining and external calibration, especially as hardware platforms evolve.

PMT is foundational in sustainable computing research, enabling metrics-driven workload design, energy-aware optimization loops, and aggregation platforms for green HPC and embedded systems. For further details, reference implementation codebases are available as open-source repositories (Corda et al., 2022, Simsek et al., 2023, Mazzola et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Power Measurement Toolkit (PMT).