Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 100 tok/s

Gemini 2.5 Pro 58 tok/s Pro

GPT-5 Medium 29 tok/s

GPT-5 High 29 tok/s Pro

GPT-4o 103 tok/s

GPT OSS 120B 480 tok/s Pro

Kimi K2 215 tok/s Pro

2000 character limit reached

The Anatomy of a Personal Health Agent (2508.20148v1)

Published 27 Aug 2025 in cs.AI, cs.HC, and cs.MA

Abstract: Health is a fundamental pillar of human wellness, and the rapid advancements in LLMs have driven the development of a new generation of health agents. However, the application of health agents to fulfill the diverse needs of individuals in daily non-clinical settings is underexplored. In this work, we aim to build a comprehensive personal health agent that is able to reason about multimodal data from everyday consumer wellness devices and common personal health records, and provide personalized health recommendations. To understand end-users' needs when interacting with such an assistant, we conducted an in-depth analysis of web search and health forum queries, alongside qualitative insights from users and health experts gathered through a user-centered design process. Based on these findings, we identified three major categories of consumer health needs, each of which is supported by a specialist sub-agent: (1) a data science agent that analyzes personal time-series wearable and health record data, (2) a health domain expert agent that integrates users' health and contextual data to generate accurate, personalized insights, and (3) a health coach agent that synthesizes data insights, guiding users using a specified psychological strategy and tracking users' progress. Furthermore, we propose and develop the Personal Health Agent (PHA), a multi-agent framework that enables dynamic, personalized interactions to address individual health needs. To evaluate each sub-agent and the multi-agent system, we conducted automated and human evaluations across 10 benchmark tasks, involving more than 7,000 annotations and 1,100 hours of effort from health experts and end-users. Our work represents the most comprehensive evaluation of a health agent to date and establishes a strong foundation towards the futuristic vision of a personal health agent accessible to everyone.

Collections

Summary

The paper introduces a modular multi-agent framework that integrates data science, domain expertise, and health coaching to deliver personalized health recommendations.
The system employs rigorous evaluations using both automated techniques and expert assessments, demonstrating marked improvements in statistical analysis and differential diagnosis.
The modular and orchestrated approach enhances task decomposition and iterative refinement, setting a new standard for personalized, trustworthy health AI.

The Anatomy of a Personal Health Agent: A Modular Multi-Agent Framework for Personalized Health AI

Introduction

This paper presents a comprehensive framework for a Personal Health Agent (PHA) that leverages LLMs to provide personalized health recommendations by reasoning over multimodal data from consumer wearables and health records. The work addresses the underexplored challenge of supporting diverse, non-clinical health needs in daily life, moving beyond prior LLM-based health assistants that are limited in scope, reasoning, and personalization. The authors propose a modular, multi-agent system, each sub-agent specializing in a core competency: data science, health domain expertise, and health coaching. The system is evaluated through a rigorous, multi-dimensional framework, including both automated and extensive human expert/user assessments.

User-Centered Design and Health Needs Taxonomy

The design of PHA is grounded in a user-centered methodology, synthesizing over 1,300 real-world health queries from web search, forums, and surveys, and expert workshops. This analysis identifies four critical user journey categories (CUJs):

General Health Knowledge: Factual, open-ended health questions.
Personal Data Insights: Interpretation and contextualization of personal health data.
Wellness Advice: Actionable, personalized recommendations for behavior change.
Personal Medical Symptoms: Symptom assessment and triage.

These categories inform the modular decomposition of the agent, ensuring coverage of the full spectrum of consumer health needs.

Modular Multi-Agent Architecture

Data Science Agent (DS Agent)

The DS Agent is responsible for robust statistical analysis of personal and population-level time-series health data. Its architecture is a two-stage pipeline:

Analysis Plan Generation: Translates ambiguous, open-ended queries into structured, reproducible statistical analysis plans, explicitly operationalizing variables, data transformations, sufficiency checks, and statistical tests.
Code Generation and Execution: Converts the plan into executable Python code, with iterative self-correction for error handling.

Evaluation: On a benchmark of 141 query-plan pairs, the DS Agent achieves a mean plan quality score of 75.6% (vs. 53.7% for the base Gemini model, p<0.001), with substantial improvements in data availability and timeframe selection. Code generation pass rates reach 79.0% after five trials, with a significant reduction in data handling errors (11.0% vs. 25.4%).

Domain Expert Agent (DE Agent)

The DE Agent provides authoritative, contextualized medical knowledge and reasoning. It employs a multi-step Reason-Investigate-Examine cycle, integrating tools for web search, biomedical literature, and population statistics, and synthesizes evidence-based, personalized responses.

Evaluation: The DE Agent outperforms the base model on four medical MCQ benchmarks (overall accuracy 83.6% vs. 81.8%, p=0.002), and achieves higher top-1/5/10 accuracy in differential diagnosis tasks (46.1%/75.6%/84.5%). In contextualized Q&A, it is rated as significantly more trustworthy (96.9% vs. 38.7%) and preferred for personalization (71.9% win rate). Clinician evaluation of multimodal health summaries shows strong gains in clinical significance, cross-modal association, and comprehensiveness.

Health Coach Agent (HC Agent)

The HC Agent is designed for multi-turn, mixed-initiative health coaching, incorporating motivational interviewing and goal-setting best practices. Its modular architecture separates personalized coaching, recommendation timing, and conversation conclusion modules.

Evaluation: In user studies, the HC Agent is rated higher for conversation flow, motivational interviewing, and feedback incorporation. Expert raters confirm superior performance in goal identification, active listening, and personalized intervention. Notably, the agent is less optimized for progress tracking, suggesting an area for further refinement.

Orchestrated Multi-Agent Collaboration

The PHA system employs an orchestrator that dynamically assigns main and supporting agents based on query classification, decomposes tasks, and iteratively synthesizes responses with memory updates for conversational coherence. This design is informed by principles of modular cognition, adaptive support, low user burden, and architectural simplicity.

Comprehensive Evaluation

The PHA framework is evaluated on 10 benchmark tasks using the WEAR-ME dataset (N~1500), with over 7,000 human annotations and 1,100 hours of expert/user effort. Both end-users and health experts assess multi-turn conversations across 50 representative personas.

End-User Perspective: PHA is ranked as the best system in 48.7% of cases, outperforming both single-agent and parallel multi-agent baselines in overall quality, data analysis, and data interpretation. Users highlight the system's ability to synthesize quantitative and qualitative insights into actionable, personalized advice.
Expert Perspective: Experts show an even stronger preference for PHA (80% top ranking), citing superior technical depth, clinical accuracy, and effective integration of data science, domain knowledge, and coaching. The orchestrated, iterative collaboration is critical for producing coherent, contextually relevant, and safe recommendations.

PHA achieves these gains with lower computational cost and latency than parallel multi-agent baselines, though it remains more resource-intensive than single-agent systems.

Limitations and Future Directions

Statistical Reasoning: The DS Agent's handling of data distributions and advanced statistical modeling remains limited.
Tool Selection and Factuality: The DE Agent's reliance on web search can introduce noise; improved source selection and domain-restricted retrieval are needed.
Coaching Progress Tracking: The HC Agent underperforms in progress measurement, indicating a need for enhanced longitudinal tracking modules.
Scalability: The multi-agent architecture increases LLM call volume and latency, presenting challenges for real-time deployment.
Ethical and Regulatory Considerations: Algorithmic bias, privacy, security, and user over-reliance are critical risks. The system is explicitly not designed to replace clinical expertise, and any real-world deployment would require rigorous regulatory review.

The authors suggest future research into dynamic, competitive/cooperative agent pools, longitudinal impact studies, and fairness-aware evaluation.

Implications

This work demonstrates that modular, orchestrated multi-agent systems can substantially improve the personalization, accuracy, and actionability of AI-driven health recommendations. The explicit separation of data analysis, domain reasoning, and coaching enables both independent evaluation and targeted improvement of each competency. The comprehensive evaluation framework sets a new standard for benchmarking health AI agents, emphasizing both user and expert perspectives.

The PHA framework provides a validated blueprint for next-generation personal health AI, supporting the vision of accessible, trustworthy, and context-aware health agents. The modular approach is model-agnostic and extensible to future LLMs and health data modalities.

Conclusion

The Anatomy of a Personal Health Agent establishes a robust, modular multi-agent framework for personalized health AI, validated through extensive, multi-level evaluation. The system's architecture and evaluation methodology offer a foundation for future research and development of safe, effective, and user-centered health agents. The work highlights the necessity of specialization, orchestration, and rigorous assessment in advancing the practical utility of LLM-based health assistants.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (38)

First 10 authors:

Tweets

https://twitter.com/fly51fly/status/1961551470652002718

https://twitter.com/jematos1992/status/1964101261168455798

YouTube

Show All Videos

alphaXiv

The Anatomy of a Personal Health Agent (26 likes, 0 questions)