Papers
Topics
Authors
Recent
2000 character limit reached

LocalPlayground: AR & Multi-Agent Framework

Updated 9 December 2025
  • LocalPlayground is a dual-faceted research framework that integrates embodied AR social play and computational multi-agent reasoning.
  • The AR component features five interactive mobile apps designed to promote proxemic engagement and collaborative physical play.
  • The agentic search environment employs LLM-driven tools for multi-hop reasoning, validated against real-world local service benchmarks.

LocalPlayground designates two distinct research frameworks addressing multi-agent interaction, each focusing on embodied real-world experience and complex reasoning in local physical and service environments. The term originates from (i) a mobile augmented reality (AR) social play blueprint for in-person engagement (Dagan et al., 2022), and (ii) the unified execution environment for multi-hop, agentic search in local life services, integral to the LocalSearchBench benchmark (He et al., 8 Dec 2025).

1. Conceptual Overview

LocalPlayground, in the AR context, is a set of design principles, experiential motifs, and application patterns for co-located, playful, embodied interaction using mobile AR apps, intended to foster shared physical and social activity. As a computational research platform, LocalPlayground is the execution backbone for evaluating LLM agents on multi-step, cross-source reasoning tasks relevant to local services, integrating a tool-rich agent environment and a robust validation protocol (He et al., 8 Dec 2025).

2. Architecture and Functional Components

Mobile AR Play Zone (Project IRL)

Five “playground” apps instantiate LocalPlayground principles in Project IRL (Dagan et al., 2022):

  • Face It: Rapid device-passing with facial expression recognition; provokes proxemic contact and shared amusement.
  • Feeture Films: Parent–child joint foot tracking for narrative sock-puppet play; supports intimate, cooperative engagement.
  • Treasure Treat: Collaborative pet-owner interaction, tracking dog silhouettes and virtual coin collection.
  • Milky Way: Synchronous multi-device play, anchored by a shared TV screen; groups compete in real-space “whack-a-mole.”
  • Freezing Frenzy: Head-to-head rear-camera body tracking in a timed freeze-tag contest.

Agentic Search Evaluation Environment

In LocalSearchBench, LocalPlayground orchestrates:

  • Search Agent: LLM-driven, allowed up to 5 rounds comprising one LocalRAG (dense retrieval + rerank from a 29-field merchant database) and one WebSearch (wrapper over Baidu Search) call per round.
  • Validation Agent: LLM-as-Judge (Claude-Sonnet-4), scoring responses on correctness, completeness, faithfulness, fluency, safety, rounds, and tool calls.

APIs are JSON-based:

  • LocalRAGSearch consumes a natural-language merchant query, returns top-100 vector matches and top-20 reranked candidates.
  • WebSearch accepts dynamic queries and returns recent factual web snippets.

Agent prompts embed tool invocation tags (e.g., <rag>, <web_search>), and maintain context concatenation across reasoning rounds.

3. Data Structure and Schema

Embodied Play Schema

Each AR play experience is bounded by design guidelines:

  1. Device Arrangement: Shared vs. individual phone use, shaping physical dynamics and coordination.
  2. Enablers: Physical anchors—faces, bodies, pets, objects—central to AR content triggering and attention.
  3. Reality Modification Affordances: Support for transformation, surprise, and iterative improvisation.
  4. Co-located Play: Mechanisms for competition, cooperation, and scalable group participation.

Local Life Search Schema

The backbone is a 29-field merchant record, including:

  • Identification and categorization,
  • Location (city, district, GPS),
  • Operations (hours, price, services),
  • Ratings/promotions,
  • Anonymized contacts.

A dense vector index is constructed using an 8B-parameter model; retrieval is cosine similarity-based, followed by an 8B-parameter learned reranker for top-20 selection.

4. Multi-Hop Reasoning and Interaction Workflows

AR Play Scenarios

Interaction is exemplified by co-located, participatory sessions (e.g., family foot-puppet narratives, coordinated dog-owner play, multi-device group activities), blending physical movement, shared attention, and real-time feedback (Dagan et al., 2022).

Agentic Reasoning Example

A typical multi-hop agent workflow in LocalPlayground (He et al., 8 Dec 2025):

  1. Issue initial merchant database query (LocalRAG) per location, price, or cuisine constraints.
  2. Filter candidates by facilities (e.g., parking) from database fields.
  3. Use WebSearch to verify real-time attributes (e.g., opening hours, reservation availability).
  4. Iteratively refine candidates, compose structured output (name, address, hours, features, booking status).
  5. Concatenate intermediate results to prompt context in each round for reasoning traceability.

5. Evaluation Methodology and Metrics

AR Deployment Study

  • 101 participants across Face It, Feeture Films, Treasure Treat, Milky Way, and Freezing Frenzy.
  • Qualitative measures: report of enhanced proxemics, social focal points, synchronized movement, and collaborative “hacking.”
  • No formal mathematical models; controlled session durations (up to 20 minutes, 30–60s per game round); no global scoring metrics.

Search Agent Evaluation

Automated validation along:

  • Correctness: Ci{0,1}C_i \in \{0,1\} binary, averaged.
  • Completeness: [0,10][0,10], normalized to [0,1].
  • Faithfulness: [0,10][0,10], normalized to [0,1].
  • Fluency/Safety: [0,10][0,10], normalized.

Results (N=5 rounds): DeepSeek-V3.1 achieves 34.34% correctness, 80.00% completeness, 60.80% faithfulness. Model average: 29.95% correctness, 77.33% completeness, 61.99% faithfulness. Noted error modes include constraint satisfaction gaps, hallucinated web-derived content, early reasoning termination (He et al., 8 Dec 2025).

6. Design Principles and Empirical Outcomes

Playful Co-Located AR

Empirical synthesis indicates:

  • Device-sharing increases interpersonal proximity and communication.
  • Physical enablers (face/bodies/pets) focus engagement and foster movement synchrony.
  • Augmented transformations and open-ended affordances facilitate collaborative experimentation.
  • Participants express demand for upscaled, multi-user support and integration of physical play artifacts (Dagan et al., 2022).

LocalPlayground’s modular, repeatable evaluation exposes persistent hardness in multi-constraint, real-world agentic reasoning, underscoring the necessity of domain-specific benchmarks and tool augmentation for LLM-based agent research (He et al., 8 Dec 2025).

7. Significance and Research Trajectory

LocalPlayground exemplifies two influential paradigms: (1) the experimental, design-guideline-driven advancement of embodied AR interaction for promoting social play, and (2) a rigorously controlled testbed for benchmarking the reasoning capabilities of LLM search agents in complex, real-world local service environments. This dual adoption highlights the growing convergence between embodied, in-situ interaction design and computational multi-agent reasoning platforms, providing fertile context for future empirical research and method development (Dagan et al., 2022, He et al., 8 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to LocalPlayground.