Papers
Topics
Authors
Recent
Search
2000 character limit reached

REMem: Multi-Domain Memory and Representation

Updated 4 July 2026
  • REMem is a polysemous research label that denotes varied memory systems, including dynamic, recurrent, and procedural mechanisms.
  • It spans diverse domains such as retrieval-augmented generation, vision-language-action, long-running agents, and radio environment mapping.
  • Dynamic retention and context-aware retrieval techniques in REMem lead to improved performance metrics across multimodal and robotic applications.

REMem is not a single standardized technical object in the arXiv literature. The label, together with closely related spellings such as ReMem, ReMe, REM, and “Remember Me”, has been used for several distinct systems spanning retrieval-augmented generation, long-running LLM agents, Large Vision-LLMs, Vision-Language-Action control, procedural memory for tool-using agents, video segmentation, medical image registration, and radio-environment mapping (Bursa, 4 Jan 2026, Gao et al., 13 Nov 2025, Li et al., 13 Mar 2026, Cao et al., 11 Dec 2025, Bagchi et al., 2024, Sun et al., 2021, Wei et al., 2013). In most of the recent agent and multimodal work, the name denotes mechanisms for selective retention, contextual reinstatement, or recurrent consolidation; in other fields, it is an acronymic reuse with unrelated semantics.

Usage Domain Core idea
ARM / “REMem” (Bursa, 4 Jan 2026) Retrieval-Augmented Generation Dynamic memory substrate with selective remembrance and decay
“Remember Me” / T-DRS (Gao et al., 13 Nov 2025) RoPE-based LVLMs Inference-only compensation for long-range attention decay
ReMem-VLA (Li et al., 13 Mar 2026) Vision-Language-Action Dual-level recurrent memory queries
ReMe (Cao et al., 11 Dec 2025) Tool-using LLM agents Dynamic procedural memory with distillation, reuse, and refinement
RecMem / RaMem / Evo-Memory ReMem (Dai et al., 15 May 2026, Yang et al., 22 Jun 2026, Wei et al., 25 Nov 2025) Long-running agents Recurrence-triggered consolidation, contextual reinstatement, or action-think-memory refinement
ReMem benchmark (Kwon et al., 5 May 2026) LVLM memorization and unlearning Reliable multi-hop and multi-image memorization benchmark
REM (Bagchi et al., 2024, Sun et al., 2021, Wei et al., 2013) Vision and wireless systems Diffusion-based video segmentation, resolution enhancement, or radio environment maps

1. Terminological scope and disambiguation

In current usage, REMem is best understood as a polysemous research label rather than a canonical acronym with a fixed expansion. In some papers it denotes explicit memory systems—dynamic RAG memory, recurrent agent memory, contextual reinstatement, or procedural experience pools—whereas in others it appears as REM for unrelated constructs such as a resolution enhancement module or a radio environment map (Bursa, 4 Jan 2026, Sun et al., 2021, Wei et al., 2013). A common misconception is therefore to treat every “REMem” paper as belonging to a single lineage. The literature does not support that interpretation.

The strongest cluster of usages is memory-centric. These systems typically externalize state, update it over time, and make retention or retrieval conditional on usage, recurrence, context, or utility. By contrast, the video-segmentation, medical-imaging, and wireless-networking papers use REM as a compact acronym for domain-specific modules or maps, not as a unified memory formalism (Bagchi et al., 2024, Sun et al., 2021, Turkmen et al., 2020).

2. Dynamic retrieval and long-running agent memory

In retrieval-augmented generation, Adaptive RAG Memory (ARM) is explicitly described as a “REMem” system. ARM replaces a static vector index with a dynamic memory substrate storing, for each item ii, an embedding EiE_i, an access count cic_i, a last-access time τi\tau_i, and a remembrance flag rememberedi\mathrm{remembered}_i. Retrieved items are consolidated once ciθc_i \ge \theta, while stale, unremembered items decay according to EjαEjE_j \gets \alpha \cdot E_j after a grace period tτj>γt-\tau_j>\gamma. The paper’s balanced configuration is θ=3, γ=5, α=0.95\theta=3,\ \gamma=5,\ \alpha=0.95. On a lightweight retrieval benchmark, ARM reports NDCG@5=0.9401\mathrm{NDCG@5}=0.9401, EiE_i0, EiE_i1, with a 22M-parameter embedding layer, and the end-to-end study reports that Llama 3.1 with static RAG achieves 67.2% key-term coverage while GPT-4o with dynamic selective RAG reaches the fastest responses at 8.2 s with 58.7% coverage (Bursa, 4 Jan 2026).

For long-running LLM agents, RecMem recasts consolidation as a recurrence-triggered operation. Each interaction unit EiE_i2 is embedded as EiE_i3 and stored in a subconscious layer EiE_i4. Recurrence is detected by retrieving semantically similar past items and checking EiE_i5, where EiE_i6. Only then are episodic and semantic memories extracted, followed by semantic refinement to recover omitted details. Experiments report memory-construction token reductions of up to 87% relative to Mem0, A-Mem, and MemoryOS while exceeding their accuracy (Dai et al., 15 May 2026).

RaMem addresses a different failure mode, termed context collapse: retrieved memory fragments may be topically relevant yet invalid as evidence for the current query. Its four stages are evidence anchoring, recall condition induction, validity-aware retrieval, and context-preserved synthesis. Each memory is represented as EiE_i7, where EiE_i8 carries event time, mention time, session span, participants, location, entities, and topic. At query time, a recall frame EiE_i9 is induced and used to prioritize context-compatible memories. On long-term memory benchmarks, the paper reports average F1 gains of more than 10% across several backbones (Yang et al., 22 Jun 2026).

Within the Evo-Memory benchmark, ReMem is an explicit action–think–memory refine loop. At each internal step the agent chooses cic_i0, allowing memory reasoning to become part of the action space rather than a passive RAG component. Under the benchmark’s streaming setup, ReMem reaches an average of 0.65 on the single-turn benchmarks with Gemini 2.5 Flash and 0.50/0.64 average success/progress across the multi-turn environments; with Claude 3.7 Sonnet it reaches 0.58 on the single-turn average and 0.78/0.91 average success/progress on the multi-turn environments, consistently outperforming history-only baselines and improving step efficiency (Wei et al., 25 Nov 2025).

3. Procedural memory and experience-driven agent evolution

In tool-using LLM agents, ReMe (“Remember Me, Refine Me”) is a procedural memory framework built around three mechanisms: multi-faceted distillation, context-adaptive reuse, and utility-based refinement. Memory entries are represented as cic_i1, where cic_i2 is a usage scenario, cic_i3 is experience content, cic_i4 are keywords, cic_i5 is a confidence score, and cic_i6 records tools used. Distillation extracts success patterns, failure analysis, and comparative insights from trajectories; reuse retrieves memories by embeddings of the usage scenario; refinement performs selective addition and deletion based on observed utility (Cao et al., 11 Dec 2025).

The deletion rule is explicitly utility-based: cic_i7 with cic_i8 the retrieval count and cic_i9 the number of successful uses. In the dynamic configuration on BFCL-V3 and AppWorld, Qwen3-8B improves from Avg@4/Pass@4 τi\tau_i0 without memory to τi\tau_i1 with dynamic ReMe. On BFCL-V3 alone, the same model improves from τi\tau_i2 to τi\tau_i3, and on AppWorld from τi\tau_i4 to τi\tau_i5. The paper further reports a memory-scaling effect: Qwen3-8B with dynamic ReMe slightly exceeds memoryless Qwen3-14B on Pass@4, and Qwen3-14B with dynamic ReMe exceeds memoryless Qwen3-32B on both Avg@4 and Pass@4 (Cao et al., 11 Dec 2025).

This line of work treats memory as an evolving procedural substrate rather than a trajectory log. The refinement stage is central: full addition from all trajectories yields only τi\tau_i6 on BFCL-V3 for Qwen3-8B, selective addition reaches τi\tau_i7, and selective addition plus reflection plus deletion reaches τi\tau_i8, indicating that memory quality control, not only memory quantity, is the operative variable (Cao et al., 11 Dec 2025).

4. Multimodal long-range memory, robotics, and unlearning

In RoPE-based Large Vision-LLMs, “Remember Me” denotes T-DRS, a training-free, inference-only modification to attention logits. The method inserts three components between RoPE attention computation and the softmax: τi\tau_i9 where SD-DRS derives a semantic scale from cosine similarity, DC-DRS applies a Gaussian-like distance-aware control term, and reRD-DRS adds a heavy-tailed long-range reinforcement term. On VQA benchmarks, LLaVA1.5-7B improves from 67.9 to 69.2 on ScienceQA, 62.0 to 63.1 on GQA, and 58.2 to 59.0 on TextVQA; analogous gains are reported for InterVL2-8B and Qwen2.5-VL-7B, all without retraining (Gao et al., 13 Nov 2025).

For embodied control, ReMem-VLA adds memory to Vision-Language-Action models through two sets of recurrent queries: frame-level rememberedi\mathrm{remembered}_i0 for short-term memory and chunk-level rememberedi\mathrm{remembered}_i1 for long-term memory. The frame-level state is updated every step by

rememberedi\mathrm{remembered}_i2

while the chunk-level state is updated only every rememberedi\mathrm{remembered}_i3 frames. A bidirectional connector allows action queries and hindsight queries to read from these recurrent memory slots, and an auxiliary Past Observation Prediction loss rememberedi\mathrm{remembered}_i4 strengthens visual memory. On MemoryBench plus a long-horizon task, ReMem-VLA reaches 93, 99, 100, and 86 success, averaging 94.5, versus 0.75 for OpenVLA-OFT, 8.25 for rememberedi\mathrm{remembered}_i5, and 1.5 for MemoryVLA. In four real-world robot tasks it reports 82.5% average success, compared with 11% for rememberedi\mathrm{remembered}_i6 and 8% for MemoryVLA (Li et al., 13 Mar 2026).

A different multimodal use of the name appears in ReMem, the Reliable Multi-hop and Multi-image Memorization Benchmark for LVLM unlearning. Its premise is that existing LVLM unlearning benchmarks often fail at stage 1: the model never robustly memorizes the target fictitious identities, so unlearning results are unreliable. ReMem therefore scales each identity to 100 images and 100 QA pairs, with an empirically chosen 70% single-hop / 30% multi-hop split. After fine-tuning on ReMem, LLaVA-1.5-7B reaches ROUGE 97.19, GPT-score 95.18, EM 91.50, and held-out rememberedi\mathrm{remembered}_i7 81.33; LLaVA-1.5-13B reaches ROUGE 98.92, GPT-score 98.05, EM 96.37, and rememberedi\mathrm{remembered}_i8 87.98. The benchmark also introduces Exposure,

rememberedi\mathrm{remembered}_i9

a normalized rank-based measure of how highly the model internally scores the true sensitive attribute among plausible alternatives (Kwon et al., 5 May 2026).

5. Vision, representation, and medical uses of REM

Not every REM-labeled system is a memory architecture. In video understanding, REM in “ReferEverything” is a framework for referral video segmentation that repurposes a pre-trained text-to-video diffusion model. It retains the original U-Net denoiser, VAE encoder/decoder, and CLIP text encoder, but changes the objective from denoising to mask-latent prediction. Inference is written as

ciθc_i \ge \theta0

REM matches or slightly exceeds state of the art on Ref-DAVIS, reaches 40.4 ciθc_i \ge \theta1 on BURST, 15.2 ciθc_i \ge \theta2 on VSPW stuff categories, and 49.56 ciθc_i \ge \theta3 on the Ref-VPS process benchmark, outperforming VD-IT’s 37.58 by about 12 points (Bagchi et al., 2024).

In knowledge distillation for vision transformers, ReMem denotes a teacher-side modification that couples mutual-information-aware fine-tuning with MLP reweighting. The modified transformer block is

ciθc_i \ge \theta4

which downweights top MLP blocks that the paper identifies as major sources of mutual-information loss. Combined with SAM-based fine-tuning, this turns very strong pretrained ViTs into better teachers: averaged over 16 datasets, a ViT-B teacher improves student performance from 74.0 to 78.3 while teacher accuracy changes from 86.7 to 85.7, and similar reversals of the “stronger teacher, worse student” trend are reported for ViT-Ti, ViT-S, and ViT-L (Dong et al., 29 Jun 2025).

In medical imaging, REM denotes a Resolution Enhancement Module: a lightweight 3D CNN super-resolution front-end plugged into deformable registration networks. The selected design is REM-Variant-I with global image-domain residual learning and configuration ciθc_i \ge \theta5. In the ReFDRN cascade, the main registration loss is

ciθc_i \ge \theta6

with an auxiliary Huber-based loss on REM outputs. On LPBA40 at 4× upscaling, ReFDRN improves Dice/NCC from 0.6676/0.9920 for trilinear-upsampled FDRN to 0.6736/0.9962, while ReVoxelMorph improves from 0.6593/0.9916 to 0.6676/0.9932 (Sun et al., 2021).

6. Radio environment maps and monitoring

In wireless systems, REM has a much older and unrelated meaning: Radio Environment Map. A foundational formulation partitions a region into ciθc_i \ge \theta7 meshes and assigns each location a bit-packed radio parameter

ciθc_i \ge \theta8

where ciθc_i \ge \theta9 indicates whether network EjαEjE_j \gets \alpha \cdot E_j0 is detected. The resulting radio parameter error EjαEjE_j \gets \alpha \cdot E_j1 decreases with mesh density, and the paper derives the scaling law

EjαEjE_j \gets \alpha \cdot E_j2

together with a linked notion of geographic entropy and deployment analyses for one-mesh-one-sensor and random sensor placement (Wei et al., 2013).

Later work generalizes this idea into G-REM, or generalized radio environment monitoring, which broadens classical REM from spectrum occupancy and interference maps to a multi-dimensional framework including CSI, localization, mobility, network state, device state, and external context. G-REM explicitly integrates sensing modes, sensing methods, mapping methods, external information sources, and applications such as beam management, CoMP, mobility-aware handover, physical-layer security, and RIS deployment (Turkmen et al., 2020).

A concrete autonomous instantiation is the UAV-supported generation of fine-grained 3D indoor REMs. In that system, Crazyflie 2.1 UAVs carrying Wi-Fi scanning receivers visit 72 waypoints inside a EjαEjE_j \gets \alpha \cdot E_j3 volume, collect 2696 samples from 73 MAC addresses, and train an ML regressor to predict RSS at unsampled points. The best reported predictor is a kNN regressor with MAC one-hot features scaled by a factor of 3 and EjαEjE_j \gets \alpha \cdot E_j4, achieving RMSE EjαEjE_j \gets \alpha \cdot E_j5 (Mendes et al., 2021).

Across these literatures, REMem therefore denotes a family of names rather than a single method. In the memory-centric strand, it typically refers to mechanisms for selective retention, contextual verification, or recurrent consolidation under long-horizon inference (Bursa, 4 Jan 2026, Dai et al., 15 May 2026, Yang et al., 22 Jun 2026). In multimodal and robotic work, it extends to long-range attention repair, recurrent latent state, and memorization benchmarking (Gao et al., 13 Nov 2025, Li et al., 13 Mar 2026, Kwon et al., 5 May 2026). In several other fields, however, REM is simply an acronym reused for unrelated modules and maps (Bagchi et al., 2024, Sun et al., 2021, Wei et al., 2013). The term is thus encyclopedically best treated as a cross-domain label whose meaning is determined by the specific paper and application domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to REMem.