PReSO: Personalized Recall & Spatial Optimization

Updated 4 July 2026

PReSO is a framework that integrates user-conditioned recall of candidate POIs with spatial clustering to generate compact, high-recall sets for itinerary planning.
It employs a multi-branch retrieval design and DBSCAN clustering to prune and structure candidate points while maintaining superior recall rates compared to baselines.
The approach unifies techniques from recommendation systems, multimodal retrieval, and embodied-agent memory to balance precision, spatial organization, and resource efficiency.

Searching arXiv for the cited PReSO-related papers to ground the article. Personalized Recall and Spatial Optimization (PReSO) denotes, in its explicit formulation, the preprocessing workflow introduced in TourPlanner for constructing a spatially-aware candidate point-of-interest (POI) set before downstream itinerary generation. In that formulation, PReSO is designed to prune candidate POIs while maintaining a high recall rate, extract explicit demands and infer implicit user preferences, and produce a spatially compact, information-rich candidate set for subsequent reasoning and refinement (Wang et al., 8 Jan 2026). A broader research reading is also suggested by adjacent work on recommendation, multimodal personal memory, embodied agents, and on-device retrieval: in those settings, PReSO can be understood as the joint problem of user-conditioned recall and some form of spatial structuring, where the “spatial” axis may mean geographic clustering, location-aware retrieval, embodied world coordinates, or structured organization of embedding space rather than a single universal optimization formalism (Wu et al., 2021, Jiang et al., 22 Sep 2025, Rasheed et al., 2 Jun 2026).

1. Conceptual scope and meanings of “personalized recall” and “spatial optimization”

PReSO is not a single algorithmic template. In TourPlanner, it is a concrete workflow for candidate construction in travel planning. In nearby literatures, however, the same two ideas decompose differently. “Personalized recall” may mean recovering candidate POIs from a city inventory given explicit and implicit user demands, retrieving a user’s own multimodal memories for question answering, deriving a recall embedding from a ranking embedding in a recommender, or reconstructing latent user preferences from cross-session tool-use history. “Spatial optimization” may mean geographic clustering and route ordering, location-aware ranking over geotagged memories, explicit geometric predicates such as line-of-sight and occlusion, or a more limited structuring of retrieval space through basis vectors or cluster-guided search (Wang et al., 8 Jan 2026, Jiang et al., 22 Sep 2025, Kwon et al., 9 Jun 2026).

A recurrent misconception is that all “spatial” mechanisms are geometric in the same sense. The literature is more heterogeneous. In Memory-QA, the spatial contribution is location-aware retrieval through a lexical score over location strings, not map-based reasoning or geodesic optimization. In embodied-agent memory, by contrast, geometry becomes decisive when a query depends on observer pose, target location, and intervening occupancy. In recommendation, a paper such as UniRec is only indirectly related to spatial optimization: it creates a structured recall space through basis embeddings and top- $P$ basis selection, but it does not optimize for index locality, quantization robustness, or explicit geometric constraints (Jiang et al., 22 Sep 2025, Kwon et al., 9 Jun 2026, Wu et al., 2021).

This suggests that PReSO is best treated as a family resemblance concept. Its strongest common denominator is the coupling of recall quality with a structured organization of the search space under system constraints. What changes across domains is the object being recalled—POIs, memories, user preferences, embodied observations—and the meaning of the spatial axis.

2. The explicit PReSO workflow in travel planning

In TourPlanner, PReSO is the first stage of a three-stage pipeline: PReSO constructs candidate information, Competitive consensus Chain-of-Thought (CCoT) explores the feasible solution space, and constraint-gated reinforcement learning refines the itinerary. PReSO itself is a sequential three-step workflow consisting of User Profile Construction, Multi-dimension POIs Recall, and Spatial Clustering and Integration. Its stated purpose is to solve the challenge of “Pruning candidate POIs while maintaining a high recall rate” and to “construct spatially-aware candidate POIs’ set” that is “spatially compact and information-rich” (Wang et al., 8 Jan 2026).

The first step, user profile construction, starts from a natural-language travel request $Q$ and augments it with city-specific statistical data such as transportation prices, hotel price statistics, and restaurant meal price statistics. An LLM then infers latent requirements including hotel cost category and meal cost range. The appendix prompt rules make this explicit through heuristic budget allocation. Let $N = \text{travel days} - 1$ . Then the prompt uses

$\text{Per-night hotel budget} = \frac{\text{Budget} \times 0.55}{N}$

and

$\text{Per-day meal budget} = \frac{\text{Budget} \times 0.35}{N}.$

These rules are not presented as a learned objective; they are part of the prompt-guided inference mechanism.

The second step, multi-dimension POIs recall, uses a three-branch retrieval design. The first branch performs embedding-based semantic recall, broadening coverage with extracted keywords and synonym expansion. The second branch performs canonical landmark recall over attractions rated 4A or above, ranking them by popularity and user ratings. The third branch is LLM-supplemented recall, which adds attractions that fit user preferences but may be missed by the first two channels. The appendix gives concrete sizing rules: $\text{semantic similarity recall number} = 3 \times \text{duration}, \qquad \text{POI recall total number} = 9 \times \text{duration}.$

The third step, spatial clustering and integration, clusters recalled attractions by geographic coordinates using DBSCAN, computes cluster centroids, and uses those centroids as spatial anchors for nearby accommodations and restaurants. The appendix reports DBSCAN hyperparameters of minimum samples $=4$ , $\epsilon = 1$ , and minimum cluster number $=$ duration. The output is not just a ranked attraction list but clustered urban data with cluster labels attached to attractions, restaurants, and accommodations.

Empirically, the paper isolates PReSO by comparing its candidate recall against TripTailor on the TripTailor sandbox. With GPT-4o, PReSO reaches 42.26% recall versus 27.83% recall for TripTailor, an absolute gain of 14.43%. The paper interprets this as evidence that the hybrid, multi-dimensional retrieval mechanism is better at capturing relevant environmental data before downstream reasoning (Wang et al., 8 Jan 2026).

3. Personalized recall as a systems problem

Outside itinerary planning, the recall side of PReSO appears in several distinct technical forms. In news recommendation, UniRec unifies candidate recall and ranking by learning a ranking user embedding $\mathbf{u}_{ra}$ from click history and deriving a recall embedding $Q$ 0 through attention over trainable basis user embeddings. The attention weights are

$Q$ 1

and the recall embedding is synthesized as

$Q$ 2

At inference, only the top $Q$ 3 basis embeddings are kept. With $Q$ 4 and $Q$ 5, UniRec(top) reaches R@100 = 1.516, R@200 = 2.531, R@500 = 5.142, and R@1000 = 8.485 on MIND, while also slightly improving ranking over NRMS. The paper’s relevance to PReSO lies in the separation between a discriminative ranking space and a recall space optimized for broader interest coverage (Wu et al., 2021).

In personal multimodal memory QA, Pensieve formalizes a memory repository

$Q$ 6

and augments it into

$Q$ 7

where $Q$ 8 contains OCR output, image description, and invocation completion. Retrieval fuses temporal, recency, location, and semantic signals: $Q$ 9 The learned weights reported are $N = \text{travel days} - 1$ 0, $N = \text{travel days} - 1$ 1, $N = \text{travel days} - 1$ 2, and $N = \text{travel days} - 1$ 3. On MemoryQA-s, Pensieve with GPT-4o reaches 76.8 end-to-end QA accuracy versus 63.1 for RagVL, and retrieval reaches Recall@5 = 95.5 with learned-weight fusion. This is a strong example of personalized recall over a user-owned memory store rather than over a public corpus (Jiang et al., 22 Sep 2025).

Cross-session preference recall appears in PRefine, which treats personalized tool calling as latent preference reasoning rather than direct fact retrieval. Its memory is a single evolving preference hypothesis $N = \text{travel days} - 1$ 4, updated through a generate–verify–refine loop that checks Evidence Support, Abstraction Quality, Actionability, and Temporal Consistency. On the MPT benchmark of 265 multi-session dialogues, PRefine improves context-free Preference Recall average F1 from 53.19 to 64.51, Preference Induction from 43.00 to 52.01, and Preference Transfer from 16.26 to 19.65, while using on average 23.28 tokens per dialogue, or 1.24% of the full dialogue history (Yoon et al., 20 Apr 2026).

Adaptive retrieval depth appears in RF-Mem, which distinguishes a fast Familiarity path from a deeper Recollection path. The familiarity probe computes top- $N = \text{travel days} - 1$ 5 similarity scores $N = \text{travel days} - 1$ 6, normalizes them with

$N = \text{travel days} - 1$ 7

then uses entropy

$N = \text{travel days} - 1$ 8

and mean score $N = \text{travel days} - 1$ 9 to switch between retrieval modes. The Recollection path clusters candidate memories, applies $\text{Per-night hotel budget} = \frac{\text{Budget} \times 0.55}{N}$ 0-mix query updates in embedding space, and performs bounded beam expansion. On PersonaMem at 128K scale, RF-Mem reaches 0.5394 accuracy versus 0.5259 for dense retrieval and 0.3231 for Full Context, while remaining far cheaper than full-history prompting (Zhang et al., 10 Mar 2026).

Together these systems suggest that personalized recall in PReSO is not exhausted by top- $\text{Per-night hotel budget} = \frac{\text{Budget} \times 0.55}{N}$ 1 similarity search. It may require query decomposition, recall-space transformation, multimodal augmentation, latent preference abstraction, or adaptive control over retrieval depth.

4. Spatial optimization across geographic, location-aware, and embodied settings

The most literal form of PReSO’s spatial side appears in itinerary planning. ITINERA first decomposes a request into structured subrequests, retrieves POIs with positive-minus-negative semantic scoring, clusters retrieved POIs geographically, and then orders them using cluster-aware spatial optimization. Inter-cluster order is produced by a TSP solver, while within-cluster order is solved as a constrained path-TSP. The supplementary material specifies simulated annealing for cluster ordering with $\text{Per-night hotel budget} = \frac{\text{Budget} \times 0.55}{N}$ 2, $\text{Per-night hotel budget} = \frac{\text{Budget} \times 0.55}{N}$ 3, and cooling rate $\text{Per-night hotel budget} = \frac{\text{Budget} \times 0.55}{N}$ 4. On Shanghai, ITINERA reports Recall Rate = 30.7%, Average Margin = 84.1 m, and Overlaps = 0.40, substantially improving route coherence over strong LLM baselines; removing spatial optimization raises Average Margin to 227.0 and Overlaps to 1.00 (Tang et al., 2024).

TourPlanner’s explicit PReSO workflow is geographically simpler but conceptually similar. It uses DBSCAN clustering over recalled attractions and selects nearby restaurants and hotels around cluster centroids. Unlike ITINERA, it does not expose a full route-ordering solver inside PReSO itself; that burden is deferred to downstream CCoT and reinforcement-learning stages. This suggests two variants of spatial optimization within the PReSO family: cluster-based candidate shaping before reasoning, and explicit route optimization after semantic retrieval (Wang et al., 8 Jan 2026).

A weaker notion of spatiality appears in personal memory QA. In Pensieve, each memory includes a location field $\text{Per-night hotel budget} = \frac{\text{Budget} \times 0.55}{N}$ 5, but the location score is lexical: $\text{Per-night hotel budget} = \frac{\text{Budget} \times 0.55}{N}$ 6 The paper is explicit that it does not use geographic distance, learned geospatial embeddings, or map-based reasoning. Here, “spatial” means location-aware metadata fusion rather than geometry. This matters because it marks a boundary: a system may be highly relevant to personalized recall while only partially addressing the spatial side of PReSO (Jiang et al., 22 Sep 2025).

Embodied-agent memory pushes the spatial component much further. eMEM stores observations, gists, episodes, and entities in a unified graph backed by SQLite, HNSW, and an R-tree, making memory jointly searchable by meaning, space, and time. The system exposes ten agent-facing tools, including locate, which resolves a concept to a centroid and spread radius, and recall, which chains concept-to-location resolution with spatial and cross-layer retrieval. On eMEM-Bench v1, eMEM scores 80.8 weighted mean over 988 probes, with a flat retention curve at ceiling from 1 h to 1 yr of simulated delay on room-unique items. The abstract further reports that a pure RAG baseline loses 30 pt on context-dependent retrieval and 29 pt on DRM lure rejection, isolating the contribution of multi-layer storage and consolidation (Rasheed et al., 2 Jun 2026).

The strongest geometric interpretation is given by the study of language-agent spatial memory under occlusion. That work argues that geometry should lead retrieval when the query regime is spatial and that recall must be separated from visibility. A geometry-led weighting yields mean $\text{Per-night hotel budget} = \frac{\text{Budget} \times 0.55}{N}$ 7Hit@5 $\text{Per-night hotel budget} = \frac{\text{Budget} \times 0.55}{N}$ 8 with $\text{Per-night hotel budget} = \frac{\text{Budget} \times 0.55}{N}$ 9, while the shipped linear blend fails its frozen test with mean $\text{Per-day meal budget} = \frac{\text{Budget} \times 0.35}{N}.$ 0Hit@5 $\text{Per-day meal budget} = \frac{\text{Budget} \times 0.35}{N}.$ 1, $\text{Per-day meal budget} = \frac{\text{Budget} \times 0.35}{N}.$ 2. For visibility, the paper repurposes a DDA line-of-sight test and shows that, on 849 behind-wall targets, text-only and live field-of-view cone both score 0.000, whereas cone-plus-DDA reaches 0.982. This is a useful corrective to loose uses of “spatial optimization”: some spatial queries cannot be solved by location metadata or semantic similarity because they require explicit geometry and occupancy (Kwon et al., 9 Jun 2026).

5. Resource-bounded recall, compression, and bounded activation

A distinctive feature of PReSO-style systems is that recall quality is constrained by deployment budgets. The literature therefore treats storage, latency, energy, and prompt bandwidth as part of the problem rather than as afterthoughts.

System	Main mechanism	Reported effect
RECALL / GR-T (Cai et al., 2024)	Coarse-to-fine on-device multimodal embeddings	14.9× throughput, 13.1× average energy reduction, <5% relative accuracy loss
ScrapMem (Chang et al., 5 May 2026)	Scrapbook Pages, EM-Graph, Optical Forgetting	51.0% Joint@10, Recall@10 = 70.3%, memory usage reduced by up to 93%
Memento (Ghosh et al., 28 Apr 2025)	Physiological cue selection for route recall	route recall improved by 20–23%, review time reduced by 46%, 3.86 s vs 15.35 s runtime
CBEA+LCV (Tang et al., 15 May 2026)	Bounded evidence activation and commitment validation	zero failures within validator scope at 0.49–0.60 availability; raw/long-context baselines with same gate reach zero only at 0.003–0.092

RECALL addresses the on-device substrate problem for personal multimodal recall. It computes coarse-grained embeddings via early exit during continuous ingestion, uses a lightweight predictor to estimate exit depth, and refines only shortlisted candidates at query time. In the paper’s synthesis, this supports a personal recall service that remains privacy-preserving and feasible on mobile hardware, with about 5 KB/item, about 29.3 MB/day at 6000 images/day, and about 10.4 GB/year of storage (Cai et al., 2024).

ScrapMem addresses the same problem from a different angle. It first consolidates heterogeneous personal records into temporally grounded Scrapbook Pages,

$\text{Per-day meal budget} = \frac{\text{Budget} \times 0.35}{N}.$ 3

then applies Optical Forgetting,

$\text{Per-day meal budget} = \frac{\text{Budget} \times 0.35}{N}.$ 4

so that older memories are progressively degraded by JPEG-quality and resolution schedules. The default Timed-Gentle setting is $\text{Per-day meal budget} = \frac{\text{Budget} \times 0.35}{N}.$ 5, $\text{Per-day meal budget} = \frac{\text{Budget} \times 0.35}{N}.$ 6, and $\text{Per-day meal budget} = \frac{\text{Budget} \times 0.35}{N}.$ 7 days for Recent, Mid-term, and Old memories. The resulting graph shrinks from $\text{Per-day meal budget} = \frac{\text{Budget} \times 0.35}{N}.$ 8 to $\text{Per-day meal budget} = \frac{\text{Budget} \times 0.35}{N}.$ 9, turning storage compression into a retrieval-aware structural condensation (Chang et al., 5 May 2026).

Memento studies a different resource bottleneck: how to select personalized route cues without heavy computer vision processing. It uses EEG, GSR, and PPG, fuses them with CPD and Morlet-wavelet analysis, and selects route snapshots aligned with high-attention episodes. Relative to the AMNet memorability baseline, it reports 3.86 ± 0.09 sec runtime versus 15.35 ± 0.16 sec, while improving route recall and lowering PAAS and NASA TLX. This is relevant to PReSO because it shows that a personalized recall aid can optimize not only storage but also review burden and cognitive load (Ghosh et al., 28 Apr 2025).

At the prompt/runtime layer, “Recall Isn’t Enough” argues that bounded evidence activation matters more than raw memory exposure. CBEA scores an evidence subset $\text{semantic similarity recall number} = 3 \times \text{duration}, \qquad \text{POI recall total number} = 9 \times \text{duration}.$ 0 under relevance, coverage, tail witnesses, consequence debt, and overpersonalization penalty: $\text{semantic similarity recall number} = 3 \times \text{duration}, \qquad \text{POI recall total number} = 9 \times \text{duration}.$ 1 subject to a total activation-cost budget. Its lexicographic validator then ranks structured commitments by hard-constraint violations, coverage failures, infeasible emission, and soft utility. The paper’s central lesson for PReSO is that a personalized system may need to optimize which evidence is activated under finite budget before it can safely commit to an answer (Tang et al., 15 May 2026).

6. Evaluation, failure modes, and open directions

A major shift in recent work is that PReSO-like systems are increasingly evaluated as state-maintenance systems rather than as pure retrievers. Memora makes this explicit for personalized agents operating over weeks to months of interaction. It evaluates remembering, reasoning, and recommending under additions, updates, and deletions, and introduces Forgetting-Aware Memory Accuracy: $\text{semantic similarity recall number} = 3 \times \text{duration}, \qquad \text{POI recall total number} = 9 \times \text{duration}.$ 2 Here MPA measures memory presence accuracy and FAA measures forgetting absence accuracy. The benchmark shows that external memory agents improve direct remembering but still frequently reuse obsolete memories; quarterly settings include maxima of 309 prior sessions and 94 mutations before a query. For PReSO, this implies that personalized recall must be update-aware and forgetting-aware, not just high-recall in the retrieval sense (Uddin et al., 21 Apr 2026).

Benchmark design is likewise becoming more diagnostic. MemoryQA contributes 9,357 recall questions with single-memory and multi-memory settings, explicit time-only, location-only, and joint time-and-location constraints. MPT contributes 332 Preference Recall, 293 Preference Induction, and 472 Preference Transfer instances for cross-session tool calling. eMEM-Bench is organized around eight cognitive-psychology paradigms rather than surface tasks. This diversity suggests that no single metric captures PReSO. A system may excel at semantic recall yet fail on temporal mutation, location grounding, lure rejection, or geometry-dependent visibility (Jiang et al., 22 Sep 2025, Yoon et al., 20 Apr 2026, Rasheed et al., 2 Jun 2026).

Several limitations recur across the literature. First, many systems are strong on personalized recall but only partially spatial. Memory-QA uses location-aware retrieval without geospatial reasoning; RF-Mem explores semantic neighborhoods without a persistent spatial memory layout; UniRec creates a structured recall space but does not optimize for index locality or hardware-aware retrieval (Jiang et al., 22 Sep 2025, Zhang et al., 10 Mar 2026, Wu et al., 2021). Second, strong spatial systems are often not yet personalized: eMEM is explicitly a spatio-temporal memory substrate rather than a user-modeling system. Third, high recall can still produce bad commitments: Memora shows stale-memory reuse, while CBEA+LCV shows that failures often occur when noisy evidence is turned into hard commitments rather than at retrieval time (Uddin et al., 21 Apr 2026, Tang et al., 15 May 2026).

This suggests three open directions for PReSO. One is joint personalization and geometry, where user-specific salience, routines, and preferences are integrated with explicit spatial predicates rather than added as a post hoc reranker. A second is budget-aware memory organization, where retrieval quality, prompt cost, storage cost, and latency are optimized together. A third is state-correct recall, where systems must remember what is currently true, suppress invalidated facts, and abstain or repair when feasible commitments cannot be formed. The literature already provides strong ingredients for each of these directions, but no single system yet closes the full loop from personalized recall to explicit spatial optimization across all domains.