Adaptive Focus Memory (AFM)

Updated 23 November 2025

Adaptive Focus Memory (AFM) is a dynamic memory management mechanism that adjusts fidelity levels based on task-specific relevance and quantitative importance.
AFM assigns memory content to FULL, COMPRESSED, or PLACEHOLDER tiers using semantic similarity, recency weighting, and importance classification to optimize resource use.
Its practical applications include reducing token usage in large language models and enhancing EEG-driven cognitive load management in VR, improving efficiency and safety.

Adaptive Focus Memory (AFM) refers to a class of memory management mechanisms that dynamically modulate information fidelity and retrieval based on quantitative relevance, task-derived importance, or user-specific cognitive attributes. Originally formulated for LLMs operating over multi-turn conversational history, and later extended to the physiological domain (EEG-driven cognitive load in VR), AFM systems optimize the allocation of limited memory or attention resources to maximize task-relevant performance and minimize computation or cognitive interference (Cruz, 16 Nov 2025, Li et al., 3 Jun 2025).

1. Operational Principles and Fidelity Tiers

AFM introduces a graded memory representation, where each unit of information (message in NLP, spatial feature in VR) is assigned one of multiple fidelity levels reflecting its predicted utility for the current task:

FULL: Complete, verbatim retention. In LLMs this means the original text is passed unmodified; in VR, relevant spatial parameters retain their detailed settings.
COMPRESSED: The information is summarized or otherwise reduced to a more economical representation. In LLMs, this is a summary (either LLM-generated or heuristic); in VR, spatial or mnemonic content is simplified.
PLACEHOLDER: A minimal stub (fixed-length placeholder or omitted object) that preserves only the chronological or structural footprint under a strict resource constraint.

Assignment to tiers is determined dynamically by a continuous scoring function $s_i$ , reflecting the predicted importance and relevance of each memory unit with respect to the current task or query. Thresholding ( $\tau_{high}$ , $\tau_{mid}$ ) delineates the correspondence to FULL, COMPRESSED, and PLACEHOLDER tiers (Cruz, 16 Nov 2025).

2. Mathematical Formulation and Scoring Functions

For language modeling, AFM computes the following for each candidate memory item $m_i$ with respect to the current query $q_t$ :

Semantic Similarity: $sim(m_i, q_t) = \frac{E(m_i) \cdot E(q_t)}{\|E(m_i)\|\|E(q_t)\|}$ , where $E(\cdot)$ denotes an embedding in $\mathbb{R}^d$ .
Recency Weighting: $w_{recency}(m_i) = 0.5^{k/h}$ , with $k = t - i$ (turns since $m_i$ ) and $h$ the half-life parameter.
Importance Classification: If an LLM labels $m_i$ as CRITICAL, $s_i$ is set to 1.0 and forced to FULL. If RELEVANT or TRIVIAL, $s_i$ combines similarity and recency as follows:

$s_i = \begin{cases} 1.0, & \text{if } m_i \text{ is CRITICAL} \ sim_i \cdot (0.5 + 0.5 r_i), & \text{if RELEVANT} \ sim_i \cdot 0.25 r_i, & \text{if TRIVIAL} \end{cases}$

(Cruz, 16 Nov 2025).

Thresholds $\tau_{high}$ and $\tau_{mid}$ (e.g., 0.45 and 0.25) delineate fidelity tier boundaries.

For cognitive load-driven VR, a cubic polynomial regression models the instantaneous cognitive load index $L(t)$ from normalized Beta-band power $B(t)$ :

$L(t) = a_0 + a_1 B(t) + a_2 B(t)^2 + a_3 B(t)^3,$

with coefficients estimated by minimizing the squared error on calibration data using L-BFGS. The derived $L(t)$ then parametrically determines the adjustment of VR spatial variables, thus manifesting graded adaptation (Li et al., 3 Jun 2025).

3. Memory Packing Algorithms and Context Management

In LLM applications, AFM applies a greedy memory packing algorithm to maximize information fidelity under a strict token budget $B$ :

Score calculation for all history turns, yielding $s_i$ for each $m_i$ .
Tier assignment based on $s_i$ and predefined thresholds.
Chronological packing: For each $m_i$ $m_{i}$ :
- Attempt to fit FULL, else COMPRESSED, else PLACEHOLDER.
- Continue until the sum of token lengths $|\mathrm{rep}_i|_{tokens}$ reaches or exceeds $B$ .

Compression is handled by either a local heuristic (extractive, based on lexical/user-query overlap) or an LLM-based abstractive model. If the OpenAI API is enabled, the algorithm leverages embedding models (text-embedding-3-small), classification (gpt-4o-mini), and token counting (tiktoken). Offline, it falls back to hashing-based embeddings and heuristics.

The packing algorithm ensures that critical facts (e.g., safety-related allergies in user dialogues) are systematically prioritized for FULL inclusion, preserving essential context at minimal computational cost (Cruz, 16 Nov 2025).

4. Applications and Empirical Evaluation

LLMs

AFM was evaluated on a safety benchmark involving LLMs and conversations concerning a user with a severe peanut allergy. In both short (3-turn) and medium (9-turn) scenarios, AFM:

Retained critical facts (e.g., allergy) at 100% fidelity, matching naïve replay on safety outcomes.
Reduced average prompt tokens by ≈66% versus a stateless baseline and ≈80% versus naïve replay.
Maintained low latency and cost, achieving compute saving ratio (CSR) ≈ 0.66 (Cruz, 16 Nov 2025).

Method	Allergy Recall (Short/Med.)	Avg. Tokens	Safe?
Default (stateless)	N / N	1493	No
Naïve replay	Y / Y	2479	Yes
Recency compression	Y / N	1888	Potential
AFM	Y / Y	504	Yes

Cognitive Load-Driven VR

In the context of memory palace VR, AFM underpins the CogLocus system, which individually calibrates cognitive load via real-time EEG. Adaptive environmental modulation based on the user's cognitive index $L(t)$ led to:

≥60% increase in Beta-band power in 8/10 participants under adaptive conditions (Cohen’s d=1.0).
32% average improvement in immediate recall accuracy (paired t-test, $p=0.03$ ).
Task-specific spatial adaptations showing selective gains for certain memory strategies and scene configurations (Li et al., 3 Jun 2025).

Memory Method	Unit Time	Beta Fluctuations	Preferred Space
Location-based	30–60 s	Low	Complex rooms
Associative-only	15–20 s	High	Simple rooms
Loci + Associative	15–60 s	High	Simple w/landmarks

5. System Architectures and Implementation

In LLM/AI applications, AFM is encapsulated in a modular class (FocusManager) supporting both API-based and offline deployments. Embedding models, compressors, and classifiers are abstracted for plug-and-play extensibility.

The VR instantiation, CogLocus, implements a four-layer closed-loop architecture: (1) EEG acquisition via a Muse 2/Oculus HMD, (2) signal preprocessing with z-score normalization and artifact rejection, (3) real-time mapping of cognitive load to spatial/parametric environmental variables via C#/Grasshopper, and (4) VR scene rendering at ≈1 Hz update rates (Li et al., 3 Jun 2025).

6. Limitations and Future Directions

AFM systems confront limitations concerning generalizability and model expressiveness:

Small-N, short-duration pilot studies constrain broad inference; greater participant diversity and longitudinal assessment are needed in VR settings (Li et al., 3 Jun 2025).
Current scoring and adaptation functions—cubic in VR, weighted linear in NLP—may miss subtler cognitive or semantic patterns. Possible extensions include Gaussian process regression or neural attention-based scoring for more individualized adaptation.
Noise and artifact sensitivity, particularly in physiological data streams, motivate interest in multimodal fusion (e.g., combining EEG with eye-tracking or GSR) and more robust artifact correction.
Embedding interactive AI-guided strategies within AFM-driven systems may enable dynamic coaching and further gains, especially for learning and memory applications (Li et al., 3 Jun 2025).

A plausible implication is that widespread adoption of AFM in both LLMs and physiology-driven systems could result in more computationally efficient, robust, and safe AI systems, particularly in contexts where memory bottlenecks, real-time feedback, and individual adaptation are paramount (Cruz, 16 Nov 2025, Li et al., 3 Jun 2025).

PDF Markdown Chat (Pro)

References (2)

Adaptive Focus Memory for Language Models (2025)

Cognitive Load-Driven VR Memory Palaces: Personalizing Focus and Recall Enhancement (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Adaptive Focus Memory (AFM).