Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

72 tokens/sec

GPT-4o

61 tokens/sec

Gemini 2.5 Pro Pro

44 tokens/sec

o3 Pro

8 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data (2401.06866v2)

Published 12 Jan 2024 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs are capable of many natural language tasks, yet they are far from perfect. In health applications, grounding and interpreting domain-specific and non-linguistic data is crucial. This paper investigates the capacity of LLMs to make inferences about health based on contextual information (e.g. user demographics, health knowledge) and physiological data (e.g. resting heart rate, sleep minutes). We present a comprehensive evaluation of 12 state-of-the-art LLMs with prompting and fine-tuning techniques on four public health datasets (PMData, LifeSnaps, GLOBEM and AW_FB). Our experiments cover 10 consumer health prediction tasks in mental health, activity, metabolic, and sleep assessment. Our fine-tuned model, HealthAlpaca exhibits comparable performance to much larger models (GPT-3.5, GPT-4 and Gemini-Pro), achieving the best performance in 8 out of 10 tasks. Ablation studies highlight the effectiveness of context enhancement strategies. Notably, we observe that our context enhancement can yield up to 23.8% improvement in performance. While constructing contextually rich prompts (combining user context, health knowledge and temporal information) exhibits synergistic improvement, the inclusion of health knowledge context in prompts significantly enhances overall performance.

PDF HTML Abstract

Introduction

LLMs have shown remarkable competencies across various text generation and information retrieval tasks. In healthcare, however, their abilities to process multi-modal data, especially time-series physiological and behavioral data from wearable sensors, are yet to be thoroughly examined. The paper explores this matter by proposing Health-LLM, a framework comprehensively testing the effectiveness of eight cutting-edge LLMs in health predictions augmented with wearables data. The paper ensures robustness by incorporating diverse prompting and fine-tuning techniques and evaluates performance across thirteen key health prediction tasks.

Methodology

Health-LLM's evaluation of the models' performance on health prediction tasks is two-pronged: through zero-shot and few-shot prompting and through instructional fine-tuning. Zero-shot prompting assesses models' in-built knowledge without additional training, while few-shot prompting offers the models a few illustrative examples to learn from. Instructional fine-tuning goes a step further by adapting the whole model to the task-specific data. The paper also examines the benefit of context enhancement in prompts, where supplementary information such as user demographics or health knowledge is strategically included for performance refinement.

Findings

The paper uncovered that zero-shot prompted LLMs tend to perform on par with designated task-specific baseline models. Few-shot prompting, particularly with elaborate models such as GPT-3.5 and GPT-4, demonstrated a noteworthy understanding of the physiological time-series data. The fine-tuned Health-Alpaca model, despite being significantly smaller in size than its GPT counterparts, recorded the best performance in many tasks, underscoring the potential efficiency of LLMs when fine-tuned with health-specific data. Context enhancement was another highlight, with the inclusion of additional context in prompts leading to substantial gains, particularly when health knowledge was involved.

Implications and Ethical Considerations

The implications of this paper are profound for the healthcare domain. The research suggests that LLMs possess a largely untapped potential for predicting health outcomes from wearable sensor data, which could revolutionize patient monitoring and care. However, the authors flag critical ethical considerations such as privacy protection, bias mitigation, and the prevention of "model hallucination," where the model might generate convincing yet incorrect predictions. They call for thorough ethical considerations, enhancing the safety and reliability of LLMs in health applications before their real-world implementations.

In conclusion, this paper paves the way for future research dedicated to refining models' reasoning, enhancing personalization, and addressing data security in healthcare settings. The practical deployment of Health-LLMs could mark a significant step towards achieving AI-driven personalized healthcare but must be navigated responsibly.

PDF Markdown Bookmark Chat (Pro)

References (53)

Authors (5)

Yubin Kim (15 papers)
Xuhai Xu (38 papers)
Daniel McDuff (88 papers)
Cynthia Breazeal (48 papers)
Hae Won Park (25 papers)

Citations (44)

View on Semantic Scholar

Tweets

https://twitter.com/iScienceLuvr/status/1747496532537143788

https://twitter.com/theomitsa/status/1753003391205151056

https://twitter.com/Kseniase_/status/1752352921222865233

https://twitter.com/Yasufumi_Nakata/status/1836599244708549117

https://twitter.com/knishimae0531/status/1747772023697551802

https://twitter.com/itsParammm/status/1815676757338620289