LocalPlayground: AR & Multi-Agent Framework
- LocalPlayground is a dual-faceted research framework that integrates embodied AR social play and computational multi-agent reasoning.
- The AR component features five interactive mobile apps designed to promote proxemic engagement and collaborative physical play.
- The agentic search environment employs LLM-driven tools for multi-hop reasoning, validated against real-world local service benchmarks.
LocalPlayground designates two distinct research frameworks addressing multi-agent interaction, each focusing on embodied real-world experience and complex reasoning in local physical and service environments. The term originates from (i) a mobile augmented reality (AR) social play blueprint for in-person engagement (Dagan et al., 2022), and (ii) the unified execution environment for multi-hop, agentic search in local life services, integral to the LocalSearchBench benchmark (He et al., 8 Dec 2025).
1. Conceptual Overview
LocalPlayground, in the AR context, is a set of design principles, experiential motifs, and application patterns for co-located, playful, embodied interaction using mobile AR apps, intended to foster shared physical and social activity. As a computational research platform, LocalPlayground is the execution backbone for evaluating LLM agents on multi-step, cross-source reasoning tasks relevant to local services, integrating a tool-rich agent environment and a robust validation protocol (He et al., 8 Dec 2025).
2. Architecture and Functional Components
Mobile AR Play Zone (Project IRL)
Five “playground” apps instantiate LocalPlayground principles in Project IRL (Dagan et al., 2022):
- Face It: Rapid device-passing with facial expression recognition; provokes proxemic contact and shared amusement.
- Feeture Films: Parent–child joint foot tracking for narrative sock-puppet play; supports intimate, cooperative engagement.
- Treasure Treat: Collaborative pet-owner interaction, tracking dog silhouettes and virtual coin collection.
- Milky Way: Synchronous multi-device play, anchored by a shared TV screen; groups compete in real-space “whack-a-mole.”
- Freezing Frenzy: Head-to-head rear-camera body tracking in a timed freeze-tag contest.
Agentic Search Evaluation Environment
In LocalSearchBench, LocalPlayground orchestrates:
- Search Agent: LLM-driven, allowed up to 5 rounds comprising one LocalRAG (dense retrieval + rerank from a 29-field merchant database) and one WebSearch (wrapper over Baidu Search) call per round.
- Validation Agent: LLM-as-Judge (Claude-Sonnet-4), scoring responses on correctness, completeness, faithfulness, fluency, safety, rounds, and tool calls.
APIs are JSON-based:
LocalRAGSearchconsumes a natural-language merchant query, returns top-100 vector matches and top-20 reranked candidates.WebSearchaccepts dynamic queries and returns recent factual web snippets.
Agent prompts embed tool invocation tags (e.g., <rag>, <web_search>), and maintain context concatenation across reasoning rounds.
3. Data Structure and Schema
Embodied Play Schema
Each AR play experience is bounded by design guidelines:
- Device Arrangement: Shared vs. individual phone use, shaping physical dynamics and coordination.
- Enablers: Physical anchors—faces, bodies, pets, objects—central to AR content triggering and attention.
- Reality Modification Affordances: Support for transformation, surprise, and iterative improvisation.
- Co-located Play: Mechanisms for competition, cooperation, and scalable group participation.
Local Life Search Schema
The backbone is a 29-field merchant record, including:
- Identification and categorization,
- Location (city, district, GPS),
- Operations (hours, price, services),
- Ratings/promotions,
- Anonymized contacts.
A dense vector index is constructed using an 8B-parameter model; retrieval is cosine similarity-based, followed by an 8B-parameter learned reranker for top-20 selection.
4. Multi-Hop Reasoning and Interaction Workflows
AR Play Scenarios
Interaction is exemplified by co-located, participatory sessions (e.g., family foot-puppet narratives, coordinated dog-owner play, multi-device group activities), blending physical movement, shared attention, and real-time feedback (Dagan et al., 2022).
Agentic Reasoning Example
A typical multi-hop agent workflow in LocalPlayground (He et al., 8 Dec 2025):
- Issue initial merchant database query (LocalRAG) per location, price, or cuisine constraints.
- Filter candidates by facilities (e.g., parking) from database fields.
- Use WebSearch to verify real-time attributes (e.g., opening hours, reservation availability).
- Iteratively refine candidates, compose structured output (name, address, hours, features, booking status).
- Concatenate intermediate results to prompt context in each round for reasoning traceability.
5. Evaluation Methodology and Metrics
AR Deployment Study
- 101 participants across Face It, Feeture Films, Treasure Treat, Milky Way, and Freezing Frenzy.
- Qualitative measures: report of enhanced proxemics, social focal points, synchronized movement, and collaborative “hacking.”
- No formal mathematical models; controlled session durations (up to 20 minutes, 30–60s per game round); no global scoring metrics.
Search Agent Evaluation
Automated validation along:
- Correctness: binary, averaged.
- Completeness: , normalized to [0,1].
- Faithfulness: , normalized to [0,1].
- Fluency/Safety: , normalized.
Results (N=5 rounds): DeepSeek-V3.1 achieves 34.34% correctness, 80.00% completeness, 60.80% faithfulness. Model average: 29.95% correctness, 77.33% completeness, 61.99% faithfulness. Noted error modes include constraint satisfaction gaps, hallucinated web-derived content, early reasoning termination (He et al., 8 Dec 2025).
6. Design Principles and Empirical Outcomes
Playful Co-Located AR
Empirical synthesis indicates:
- Device-sharing increases interpersonal proximity and communication.
- Physical enablers (face/bodies/pets) focus engagement and foster movement synchrony.
- Augmented transformations and open-ended affordances facilitate collaborative experimentation.
- Participants express demand for upscaled, multi-user support and integration of physical play artifacts (Dagan et al., 2022).
Agentic Search
LocalPlayground’s modular, repeatable evaluation exposes persistent hardness in multi-constraint, real-world agentic reasoning, underscoring the necessity of domain-specific benchmarks and tool augmentation for LLM-based agent research (He et al., 8 Dec 2025).
7. Significance and Research Trajectory
LocalPlayground exemplifies two influential paradigms: (1) the experimental, design-guideline-driven advancement of embodied AR interaction for promoting social play, and (2) a rigorously controlled testbed for benchmarking the reasoning capabilities of LLM search agents in complex, real-world local service environments. This dual adoption highlights the growing convergence between embodied, in-situ interaction design and computational multi-agent reasoning platforms, providing fertile context for future empirical research and method development (Dagan et al., 2022, He et al., 8 Dec 2025).