Papers
Topics
Authors
Recent
2000 character limit reached

Adaptive Focus Memory (AFM)

Updated 23 November 2025
  • Adaptive Focus Memory (AFM) is a dynamic memory management mechanism that adjusts fidelity levels based on task-specific relevance and quantitative importance.
  • AFM assigns memory content to FULL, COMPRESSED, or PLACEHOLDER tiers using semantic similarity, recency weighting, and importance classification to optimize resource use.
  • Its practical applications include reducing token usage in large language models and enhancing EEG-driven cognitive load management in VR, improving efficiency and safety.

Adaptive Focus Memory (AFM) refers to a class of memory management mechanisms that dynamically modulate information fidelity and retrieval based on quantitative relevance, task-derived importance, or user-specific cognitive attributes. Originally formulated for LLMs operating over multi-turn conversational history, and later extended to the physiological domain (EEG-driven cognitive load in VR), AFM systems optimize the allocation of limited memory or attention resources to maximize task-relevant performance and minimize computation or cognitive interference (Cruz, 16 Nov 2025, Li et al., 3 Jun 2025).

1. Operational Principles and Fidelity Tiers

AFM introduces a graded memory representation, where each unit of information (message in NLP, spatial feature in VR) is assigned one of multiple fidelity levels reflecting its predicted utility for the current task:

  • FULL: Complete, verbatim retention. In LLMs this means the original text is passed unmodified; in VR, relevant spatial parameters retain their detailed settings.
  • COMPRESSED: The information is summarized or otherwise reduced to a more economical representation. In LLMs, this is a summary (either LLM-generated or heuristic); in VR, spatial or mnemonic content is simplified.
  • PLACEHOLDER: A minimal stub (fixed-length placeholder or omitted object) that preserves only the chronological or structural footprint under a strict resource constraint.

Assignment to tiers is determined dynamically by a continuous scoring function sis_i, reflecting the predicted importance and relevance of each memory unit with respect to the current task or query. Thresholding (τhigh\tau_{high}, τmid\tau_{mid}) delineates the correspondence to FULL, COMPRESSED, and PLACEHOLDER tiers (Cruz, 16 Nov 2025).

2. Mathematical Formulation and Scoring Functions

For language modeling, AFM computes the following for each candidate memory item mim_i with respect to the current query qtq_t:

  • Semantic Similarity: sim(mi,qt)=E(mi)â‹…E(qt)∥E(mi)∥∥E(qt)∥sim(m_i, q_t) = \frac{E(m_i) \cdot E(q_t)}{\|E(m_i)\|\|E(q_t)\|}, where E(â‹…)E(\cdot) denotes an embedding in Rd\mathbb{R}^d.
  • Recency Weighting: wrecency(mi)=0.5k/hw_{recency}(m_i) = 0.5^{k/h}, with k=t−ik = t - i (turns since mim_i) and hh the half-life parameter.
  • Importance Classification: If an LLM labels mim_i as CRITICAL, sis_i is set to 1.0 and forced to FULL. If RELEVANT or TRIVIAL, sis_i combines similarity and recency as follows:

si={1.0,if mi is CRITICAL simi⋅(0.5+0.5ri),if RELEVANT simi⋅0.25ri,if TRIVIALs_i = \begin{cases} 1.0, & \text{if } m_i \text{ is CRITICAL} \ sim_i \cdot (0.5 + 0.5 r_i), & \text{if RELEVANT} \ sim_i \cdot 0.25 r_i, & \text{if TRIVIAL} \end{cases}

(Cruz, 16 Nov 2025).

Thresholds τhigh\tau_{high} and τmid\tau_{mid} (e.g., 0.45 and 0.25) delineate fidelity tier boundaries.

For cognitive load-driven VR, a cubic polynomial regression models the instantaneous cognitive load index L(t)L(t) from normalized Beta-band power B(t)B(t):

L(t)=a0+a1B(t)+a2B(t)2+a3B(t)3,L(t) = a_0 + a_1 B(t) + a_2 B(t)^2 + a_3 B(t)^3,

with coefficients estimated by minimizing the squared error on calibration data using L-BFGS. The derived L(t)L(t) then parametrically determines the adjustment of VR spatial variables, thus manifesting graded adaptation (Li et al., 3 Jun 2025).

3. Memory Packing Algorithms and Context Management

In LLM applications, AFM applies a greedy memory packing algorithm to maximize information fidelity under a strict token budget BB:

  1. Score calculation for all history turns, yielding sis_i for each mim_i.
  2. Tier assignment based on sis_i and predefined thresholds.
  3. Chronological packing: For each mim_i:
    • Attempt to fit FULL, else COMPRESSED, else PLACEHOLDER.
    • Continue until the sum of token lengths ∣repi∣tokens|\mathrm{rep}_i|_{tokens} reaches or exceeds BB.

Compression is handled by either a local heuristic (extractive, based on lexical/user-query overlap) or an LLM-based abstractive model. If the OpenAI API is enabled, the algorithm leverages embedding models (text-embedding-3-small), classification (gpt-4o-mini), and token counting (tiktoken). Offline, it falls back to hashing-based embeddings and heuristics.

The packing algorithm ensures that critical facts (e.g., safety-related allergies in user dialogues) are systematically prioritized for FULL inclusion, preserving essential context at minimal computational cost (Cruz, 16 Nov 2025).

4. Applications and Empirical Evaluation

LLMs

AFM was evaluated on a safety benchmark involving LLMs and conversations concerning a user with a severe peanut allergy. In both short (3-turn) and medium (9-turn) scenarios, AFM:

  • Retained critical facts (e.g., allergy) at 100% fidelity, matching naïve replay on safety outcomes.
  • Reduced average prompt tokens by ≈66% versus a stateless baseline and ≈80% versus naïve replay.
  • Maintained low latency and cost, achieving compute saving ratio (CSR) ≈ 0.66 (Cruz, 16 Nov 2025).
Method Allergy Recall (Short/Med.) Avg. Tokens Safe?
Default (stateless) N / N 1493 No
Naïve replay Y / Y 2479 Yes
Recency compression Y / N 1888 Potential
AFM Y / Y 504 Yes

Cognitive Load-Driven VR

In the context of memory palace VR, AFM underpins the CogLocus system, which individually calibrates cognitive load via real-time EEG. Adaptive environmental modulation based on the user's cognitive index L(t)L(t) led to:

  • ≥60% increase in Beta-band power in 8/10 participants under adaptive conditions (Cohen’s d=1.0).
  • 32% average improvement in immediate recall accuracy (paired t-test, p=0.03p=0.03).
  • Task-specific spatial adaptations showing selective gains for certain memory strategies and scene configurations (Li et al., 3 Jun 2025).
Memory Method Unit Time Beta Fluctuations Preferred Space
Location-based 30–60 s Low Complex rooms
Associative-only 15–20 s High Simple rooms
Loci + Associative 15–60 s High Simple w/landmarks

5. System Architectures and Implementation

In LLM/AI applications, AFM is encapsulated in a modular class (FocusManager) supporting both API-based and offline deployments. Embedding models, compressors, and classifiers are abstracted for plug-and-play extensibility.

The VR instantiation, CogLocus, implements a four-layer closed-loop architecture: (1) EEG acquisition via a Muse 2/Oculus HMD, (2) signal preprocessing with z-score normalization and artifact rejection, (3) real-time mapping of cognitive load to spatial/parametric environmental variables via C#/Grasshopper, and (4) VR scene rendering at ≈1 Hz update rates (Li et al., 3 Jun 2025).

6. Limitations and Future Directions

AFM systems confront limitations concerning generalizability and model expressiveness:

  • Small-N, short-duration pilot studies constrain broad inference; greater participant diversity and longitudinal assessment are needed in VR settings (Li et al., 3 Jun 2025).
  • Current scoring and adaptation functions—cubic in VR, weighted linear in NLP—may miss subtler cognitive or semantic patterns. Possible extensions include Gaussian process regression or neural attention-based scoring for more individualized adaptation.
  • Noise and artifact sensitivity, particularly in physiological data streams, motivate interest in multimodal fusion (e.g., combining EEG with eye-tracking or GSR) and more robust artifact correction.
  • Embedding interactive AI-guided strategies within AFM-driven systems may enable dynamic coaching and further gains, especially for learning and memory applications (Li et al., 3 Jun 2025).

A plausible implication is that widespread adoption of AFM in both LLMs and physiology-driven systems could result in more computationally efficient, robust, and safe AI systems, particularly in contexts where memory bottlenecks, real-time feedback, and individual adaptation are paramount (Cruz, 16 Nov 2025, Li et al., 3 Jun 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Adaptive Focus Memory (AFM).