Papers
Topics
Authors
Recent
Search
2000 character limit reached

DAEM: Dynamic Approximation Execution Manager

Updated 22 May 2026
  • DAEM is a runtime memory management system that dynamically tunes hardware knobs like cache voltage and DRAM refresh intervals using model-free reinforcement learning to optimize energy efficiency while meeting QoS requirements.
  • It integrates hardware modifications, a Linux kernel module, and user-level APIs to enable approximate memory allocation and real-time adaptation to varying workloads.
  • It employs formal optimization and TD(λ)-based online learning to coordinate interdependent memory knobs, achieving energy savings up to 37% with minimal quality-of-service violations.

A Dynamic Approximation Execution Manager (DAEM) is a runtime system positioned between user-level applications and the hardware memory subsystem, explicitly designed to optimize system energy efficiency by dynamically coordinating multiple approximation knobs across a heterogeneous memory hierarchy. DAEM employs a self-optimizing control policy that continuously tunes parameters such as on-chip cache voltage and DRAM refresh intervals, subject to configurable application quality-of-service (QoS) constraints. Building on the AXES model-free runtime manager, DAEM operates without design-time profiling, instead adapting policy parameters on-the-fly for unknown workloads and varying hardware configurations, with formal optimization driven by power–quality trade-offs (Maity et al., 2020).

1. System Structure and Execution Pathways

DAEM orchestrates approximate memory management through an integrated software–hardware stack:

  • Hardware Support: The platform extends a RISC-V (Ariane) core with configurable Control-and-Status Registers (CSRs) for three principal approximation knobs:
    • L1 data cache supply voltage (VDDL1V_{DD}^{L1})
    • L2 shared cache supply voltage (VDDL2V_{DD}^{L2})
    • Main memory (off-chip DRAM) refresh period (tREFt_{REF}), or ECC mode
    • Fault-injection (FI) modules in L1/L2 controllers and the write buffer emulate bit-error rates (BER) for each knob setting.
  • Linux Kernel Module: Exposes a malloc_approx() API enabling user applications to allocate physically contiguous “approximate” segments; the module writes segment bounds and desired knob settings into dedicated CSRs (AX_L1_LEVEL, AX_L2_LEVEL, AX_DRAM_LEVEL, AX_ENABLE, AX_DISABLE).
  • User-Level Application: Marks noncritical buffers via malloc_approx() and provides a quality monitor callback for real-time computation of a QoS metric (e.g., RMSE, average relative error).
  • DAEM Runtime Manager: Periodically queries current knob settings and monitors QoS/power metrics, then applies a reinforcement-learning-based control law to decide on relative knob adjustments for each memory layer, updating CSRs accordingly.

2. Formal Optimization Framework

DAEM formalizes the dynamic approximation problem using a constrained optimization formulation: minaE[Power(a)]s.t.QoS(a)Qmin\min_{\mathbf{a}}\, E[Power(\mathbf{a})] \quad \text{s.t.} \quad QoS(\mathbf{a}) \ge Q_{\min} where a=(aL1,aL2,aDRAM)\mathbf{a} = (a_{L1},a_{L2},a_{DRAM}) are discrete knob vectors and QminQ_{\min} is the application–specified minimum QoS threshold.

The manager defines a scalar reward incorporating both energy reduction and QoS penalty: rP(a)=1Power(a)Powermaxr_P(\mathbf{a}) = 1 - \frac{Power(\mathbf{a})}{Power_{\max}}

rQ(a)={0,QoS(a)Qmin QminQoS(a)maxQ,QoS(a)<Qminr_Q(\mathbf{a}) = \begin{cases} 0, & QoS(\mathbf{a}) \ge Q_{\min} \ -\dfrac{Q_{\min} - QoS(\mathbf{a})}{max_Q}, & QoS(\mathbf{a}) < Q_{\min} \end{cases}

R={rP,QoSQmin rQ,otherwiseR = \begin{cases} r_P, & QoS \ge Q_{\min} \ r_Q, & \text{otherwise} \end{cases}

Cumulative reward maximization implicitly pushes for minimal power consumption while preserving or rapidly recovering application-level QoS.

3. Online Learning and Control

DAEM defines the knob-adjustment process as a Markov Decision Process (MDP) (S,A,P,R)(S, A, P, R):

  • State (VDDL2V_{DD}^{L2}0): 4-tuple VDDL2V_{DD}^{L2}1 capturing the discrete knob levels at each memory hierarchy layer and the quantized QoS delta (VDDL2V_{DD}^{L2}2), with error bucketed into 16 bins.
  • Action (VDDL2V_{DD}^{L2}3): Vector of relative changes VDDL2V_{DD}^{L2}4 per knob, enabling incremental tuning.
  • Reward (VDDL2V_{DD}^{L2}5): As above, coupling joint power reduction and QoS maintenance.
  • Transition Model (VDDL2V_{DD}^{L2}6): Unknown—DAEM employs model-free Temporal Difference (TD(VDDL2V_{DD}^{L2}7)) learning with eligibility traces for rapid credit assignment.

TD(VDDL2V_{DD}^{L2}8) Algorithm

DAEM iteratively:

  1. Initializes VDDL2V_{DD}^{L2}9 and eligibility traces tREFt_{REF}0.
  2. Observes the initial state, selects an action (tREFt_{REF}1-greedy).
  3. Periodically:
    • Applies action (tREFt_{REF}2), updating hardware knobs.
    • Samples new power/QoS measurements, computes reward.
    • Observes successor state, selects next action.
    • Computes TD error (tREFt_{REF}3) and propagates it through tREFt_{REF}4, updating eligibility traces per:

    tREFt_{REF}5

- (tREFt_{REF}6)

This model-free approach enables DAEM to adapt to previously unseen workload–hardware combinations with no design-time retraining.

4. Coordination of Interdependent Memory Knobs

DAEM encodes cross-layer dependencies within its state and reward formulation. For example, reducing tREFt_{REF}7 inflates L1 BER, which subsequently propagates to L2. The system’s state vector contains all three knob levels, and reward attribution is joint—only knob settings that collectively meet the QoS minimum receive a nonzero power reward. When a QoS violation occurs, the reward function applies a global penalty to the current joint configuration, causing the controller to increase one or several of the voltage or refresh parameters (e.g., raising tREFt_{REF}8 and/or tREFt_{REF}9) until application QoS recovers.

Eligibility traces further accelerate convergence by crediting sequences of actions that jointly restore acceptable application-level quality, promoting efficient exploration in highly interdependent configuration spaces.

5. Hardware and Software Implementation

DAEM’s implementation targets a RISC-V (Ariane) platform on OpenPiton, realized on a Xilinx Artix-7 FPGA. Key details:

  • CSRs: AX_L1_LEVEL, AX_L2_LEVEL, AX_DRAM_LEVEL, AX_ENABLE, AX_DISABLE, memory segment bounds for malloc_approx.

  • Linux LKM: Handles approximate segment allocation, programs per-region knob settings, propagates an “approx” bit through MMU tag logic to cache controllers. FI modules constrain errors to marked buffer regions.

  • Approximation Knobs:

    • L1/L2 cache: supply voltages in {0.7, 0.8, 0.9, 1.0} V, with SRAM BER modeled as a function of minaE[Power(a)]s.t.QoS(a)Qmin\min_{\mathbf{a}}\, E[Power(\mathbf{a})] \quad \text{s.t.} \quad QoS(\mathbf{a}) \ge Q_{\min}0.
    • DRAM: refresh intervals {20 s, 5 s, 1 s, 0.1 s}; BER and energy effects drawn from prior art.
  • Power Modeling: McPAT/Sniper at 1 V/0.1 s base, per-component rescaling according to knob setting and error model.
  • Target Workloads:
    • Canny edge detection (QoS: output RMSE)
    • K-means clustering (RGB image RMSE)
    • Black–Scholes (mean relative error)
  • Quality-of-Service: Only noncritical application buffers are marked approximate and monitored.

6. Empirical Evaluation

DAEM demonstrates robust empirical performance across diverse workloads and operating conditions (Maity et al., 2020):

Workload Energy Saving (%) QoS Violation Reduction (%) Power Overhead (%) Typical Adaptation Time
Canny edge detect. up to 37 75 <5 tens of frames
K-means clustering ~24 n/a n/a n/a
Black–Scholes ~29 n/a n/a n/a
  • On canny edge detection, DAEM yields up to 37% memory subsystem energy savings (unconstrained). With enforced runtime QoS constraints, adaptation occurs in tens of frames, reducing violations by 75% at <5% additional energy cost.
  • In the k-means and black–Scholes workloads, energy savings range from ~24% to ~29% under dynamic QoS.
  • Overhead from DAEM invocation (every five frames) adds approximately 4% to compute time, with consistent ~16% power savings.

7. Characteristics and Significance

DAEM, instantiated via AXES, exemplifies a model-free, runtime-managed memory approximation system with the following properties:

  • Eliminates the need for design-time characterization or profiling: adapts automatically to workloads and new memory technologies.
  • Jointly coordinates interdependent memory approximation knobs—explicitly accounting for cross-layer error propagation.
  • Realizes substantial energy savings with minimal compromise in application output quality or perturbation to runtime overhead.
  • Provides a generalizable runtime learning architecture for approximate computing in heterogeneous, multi-level memory systems.

These results position DAEM as a reference architecture for dynamic, feedback-driven memory approximation frameworks, supporting fine-grained adaptability and QoS-aware optimization within complex memory hierarchies (Maity et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dynamic Approximation Execution Manager (DAEM).