DAEM: Dynamic Approximation Execution Manager
- DAEM is a runtime memory management system that dynamically tunes hardware knobs like cache voltage and DRAM refresh intervals using model-free reinforcement learning to optimize energy efficiency while meeting QoS requirements.
- It integrates hardware modifications, a Linux kernel module, and user-level APIs to enable approximate memory allocation and real-time adaptation to varying workloads.
- It employs formal optimization and TD(λ)-based online learning to coordinate interdependent memory knobs, achieving energy savings up to 37% with minimal quality-of-service violations.
A Dynamic Approximation Execution Manager (DAEM) is a runtime system positioned between user-level applications and the hardware memory subsystem, explicitly designed to optimize system energy efficiency by dynamically coordinating multiple approximation knobs across a heterogeneous memory hierarchy. DAEM employs a self-optimizing control policy that continuously tunes parameters such as on-chip cache voltage and DRAM refresh intervals, subject to configurable application quality-of-service (QoS) constraints. Building on the AXES model-free runtime manager, DAEM operates without design-time profiling, instead adapting policy parameters on-the-fly for unknown workloads and varying hardware configurations, with formal optimization driven by power–quality trade-offs (Maity et al., 2020).
1. System Structure and Execution Pathways
DAEM orchestrates approximate memory management through an integrated software–hardware stack:
- Hardware Support: The platform extends a RISC-V (Ariane) core with configurable Control-and-Status Registers (CSRs) for three principal approximation knobs:
- Linux Kernel Module: Exposes a
malloc_approx()API enabling user applications to allocate physically contiguous “approximate” segments; the module writes segment bounds and desired knob settings into dedicated CSRs (AX_L1_LEVEL, AX_L2_LEVEL, AX_DRAM_LEVEL, AX_ENABLE, AX_DISABLE). - User-Level Application: Marks noncritical buffers via
malloc_approx()and provides a quality monitor callback for real-time computation of a QoS metric (e.g., RMSE, average relative error). - DAEM Runtime Manager: Periodically queries current knob settings and monitors QoS/power metrics, then applies a reinforcement-learning-based control law to decide on relative knob adjustments for each memory layer, updating CSRs accordingly.
2. Formal Optimization Framework
DAEM formalizes the dynamic approximation problem using a constrained optimization formulation: where are discrete knob vectors and is the application–specified minimum QoS threshold.
The manager defines a scalar reward incorporating both energy reduction and QoS penalty:
Cumulative reward maximization implicitly pushes for minimal power consumption while preserving or rapidly recovering application-level QoS.
3. Online Learning and Control
DAEM defines the knob-adjustment process as a Markov Decision Process (MDP) :
- State (0): 4-tuple 1 capturing the discrete knob levels at each memory hierarchy layer and the quantized QoS delta (2), with error bucketed into 16 bins.
- Action (3): Vector of relative changes 4 per knob, enabling incremental tuning.
- Reward (5): As above, coupling joint power reduction and QoS maintenance.
- Transition Model (6): Unknown—DAEM employs model-free Temporal Difference (TD(7)) learning with eligibility traces for rapid credit assignment.
TD(8) Algorithm
DAEM iteratively:
- Initializes 9 and eligibility traces 0.
- Observes the initial state, selects an action (1-greedy).
- Periodically:
- Applies action (2), updating hardware knobs.
- Samples new power/QoS measurements, computes reward.
- Observes successor state, selects next action.
- Computes TD error (3) and propagates it through 4, updating eligibility traces per:
5
- (6)
This model-free approach enables DAEM to adapt to previously unseen workload–hardware combinations with no design-time retraining.
4. Coordination of Interdependent Memory Knobs
DAEM encodes cross-layer dependencies within its state and reward formulation. For example, reducing 7 inflates L1 BER, which subsequently propagates to L2. The system’s state vector contains all three knob levels, and reward attribution is joint—only knob settings that collectively meet the QoS minimum receive a nonzero power reward. When a QoS violation occurs, the reward function applies a global penalty to the current joint configuration, causing the controller to increase one or several of the voltage or refresh parameters (e.g., raising 8 and/or 9) until application QoS recovers.
Eligibility traces further accelerate convergence by crediting sequences of actions that jointly restore acceptable application-level quality, promoting efficient exploration in highly interdependent configuration spaces.
5. Hardware and Software Implementation
DAEM’s implementation targets a RISC-V (Ariane) platform on OpenPiton, realized on a Xilinx Artix-7 FPGA. Key details:
CSRs: AX_L1_LEVEL, AX_L2_LEVEL, AX_DRAM_LEVEL, AX_ENABLE, AX_DISABLE, memory segment bounds for
malloc_approx.Linux LKM: Handles approximate segment allocation, programs per-region knob settings, propagates an “approx” bit through MMU tag logic to cache controllers. FI modules constrain errors to marked buffer regions.
Approximation Knobs:
- L1/L2 cache: supply voltages in {0.7, 0.8, 0.9, 1.0} V, with SRAM BER modeled as a function of 0.
- DRAM: refresh intervals {20 s, 5 s, 1 s, 0.1 s}; BER and energy effects drawn from prior art.
- Power Modeling: McPAT/Sniper at 1 V/0.1 s base, per-component rescaling according to knob setting and error model.
- Target Workloads:
- Canny edge detection (QoS: output RMSE)
- K-means clustering (RGB image RMSE)
- Black–Scholes (mean relative error)
- Quality-of-Service: Only noncritical application buffers are marked approximate and monitored.
6. Empirical Evaluation
DAEM demonstrates robust empirical performance across diverse workloads and operating conditions (Maity et al., 2020):
| Workload | Energy Saving (%) | QoS Violation Reduction (%) | Power Overhead (%) | Typical Adaptation Time |
|---|---|---|---|---|
| Canny edge detect. | up to 37 | 75 | <5 | tens of frames |
| K-means clustering | ~24 | n/a | n/a | n/a |
| Black–Scholes | ~29 | n/a | n/a | n/a |
- On canny edge detection, DAEM yields up to 37% memory subsystem energy savings (unconstrained). With enforced runtime QoS constraints, adaptation occurs in tens of frames, reducing violations by 75% at <5% additional energy cost.
- In the k-means and black–Scholes workloads, energy savings range from ~24% to ~29% under dynamic QoS.
- Overhead from DAEM invocation (every five frames) adds approximately 4% to compute time, with consistent ~16% power savings.
7. Characteristics and Significance
DAEM, instantiated via AXES, exemplifies a model-free, runtime-managed memory approximation system with the following properties:
- Eliminates the need for design-time characterization or profiling: adapts automatically to workloads and new memory technologies.
- Jointly coordinates interdependent memory approximation knobs—explicitly accounting for cross-layer error propagation.
- Realizes substantial energy savings with minimal compromise in application output quality or perturbation to runtime overhead.
- Provides a generalizable runtime learning architecture for approximate computing in heterogeneous, multi-level memory systems.
These results position DAEM as a reference architecture for dynamic, feedback-driven memory approximation frameworks, supporting fine-grained adaptability and QoS-aware optimization within complex memory hierarchies (Maity et al., 2020).