Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Supporting Physical Activity Behavior Change with LLM-Based Conversational Agents (2405.06061v1)

Published 9 May 2024 in cs.HC

Abstract: Physical activity has significant benefits to health, yet large portions of the population remain physically inactive. Mobile health applications show promising potential for low-cost, scalable physical activity promotion, but existing approaches are often insufficiently personalized to a user's context and life circumstances. In this work, we explore the potential for LLM based conversational agents to motivate physical activity behavior change. Through formative interviews with 12 health professionals and 10 non-experts, we identify design considerations and opportunities for LLM health coaching. We present GPTCoach, a chatbot that implements an evidence-based health coaching program, uses counseling strategies from motivational interviewing, and can query and visualize health data from a wearable through tool use. We evaluate GPTCoach as a technology probe in a user study with 16 participants. Through quantitive and qualitative analyses, we find promising evidence that GPTCoach can adhere to a health coaching program while adopting a facilitative, supportive, and non-judgmental tone. We find more variable support for GPTCoach's ability to proactively make use of data in ways that foster motivation and empowerment. We conclude with a discussion of our findings, implications for future research, as well as risks and limitations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (110)
  1. Conversational health agents: A personalized llm-powered agent framework. arXiv preprint arXiv:2310.02374 (2023).
  2. Large-scale Training of Foundation Models for Wearable Biosignals. arXiv preprint arXiv:2312.05409 (2023).
  3. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv preprint arXiv:2404.14219 (2024).
  4. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
  5. LLM in a flash: Efficient Large Language Model Inference with Limited Memory. arXiv preprint arXiv:2312.11514 (2023).
  6. Shakra: tracking and sharing daily activity levels with unaugmented mobile phones. Mobile networks and applications 12 (2007), 185–199.
  7. Star-gate: Teaching language models to ask clarifying questions. arXiv preprint arXiv:2403.19154 (2024).
  8. Activity river: Visualizing planned and logged personal activities for reflection. In Proceedings of the International Conference on Advanced Visual Interfaces. 1–9.
  9. Exploring the Possible Use of AI Chatbots in Public Health Education: Feasibility Study. JMIR Medical Education 9 (2023), e51421.
  10. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862 (2022).
  11. Reviewing reflection: on the use of reflection in interactive system design. In Proceedings of the 2014 conference on Designing interactive systems. 93–102.
  12. The Development and Validation of the Technology-Supported Reflection Inventory. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–8.
  13. Revisiting Reflection in HCI: Four Design Resources for Technologies that Support Reflection. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 1 (2022), 1–27.
  14. Timothy Bickmore and Toni Giorgino. 2006. Health dialog systems for patients and consumers. Journal of biomedical informatics 39, 5 (2006), 556–571.
  15. Timothy W. Bickmore and Rosalind W. Picard. 2005. Establishing and maintaining long-term human-computer relationships. ACM Trans. Comput.-Hum. Interact. 12, 2 (jun 2005), 293–327. https://doi.org/10.1145/1067860.1067867
  16. A reusable framework for health counseling dialogue systems based on a behavioral medicine ontology. Journal of biomedical informatics 44, 2 (2011), 183–197.
  17. Language (technology) is power: A critical survey of” bias” in nlp. arXiv preprint arXiv:2005.14050 (2020).
  18. On the Opportunities and Risks of Foundation Models. arXiv:2108.07258 [cs.LG]
  19. Virginia Braun and Victoria Clarke. 2012. Thematic analysis. American Psychological Association.
  20. Physical activity program delivery by professionals versus volunteers: the TEAM randomized trial. Health Psychology 30, 3 (2011), 285.
  21. Multi-Level Feedback Generation with Large Language Models for Empowering Novice Peer Counselors. arXiv preprint arXiv:2403.15482 (2024).
  22. A Computational Framework for Behavioral Assessment of LLM Therapists. arXiv preprint arXiv:2401.00820 (2024).
  23. Reflection in theory and reflection in practice: An exploration of the gaps in reflection support among personal informatics apps. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–23.
  24. Characterizing visualization insights from quantified selfers’ personal data presentations. IEEE computer graphics and applications 35, 4 (2015), 28–37.
  25. Understanding self-reflection: how people reflect on personal data through visual data exploration. In Proceedings of the 11th EAI International Conference on Pervasive Computing Technologies for Healthcare. 173–182.
  26. Understanding quantified-selfers’ practices in collecting and exploring personal data. In Proceedings of the SIGCHI conference on human factors in computing systems. 1143–1152.
  27. More than telemonitoring: health provider use and nonuse of life-log data in irritable bowel syndrome and weight management. Journal of medical Internet research 17, 8 (2015), e203.
  28. WEnner: a theoretically motivated approach for tailored coaching about physical activity. In Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers. 1669–1675.
  29. Jeff L Cochran and Nancy H Cochran. 2015. The heart of counseling: Counseling skills through therapeutic relationships. Routledge.
  30. Geoffrey L Cohen and David K Sherman. 2014. The psychology of change: Self-affirmation and social psychological intervention. Annual review of psychology 65 (2014), 333–371.
  31. Activity sensing in the wild: a field trial of ubifit garden. In Proceedings of the SIGCHI conference on human factors in computing systems. 1797–1806.
  32. International physical activity questionnaire: 12-country reliability and validity. Medicine & science in sports & exercise 35, 8 (2003), 1381–1395.
  33. Alia J Crum and Ellen J Langer. 2007. Mind-set matters: Exercise and the placebo effect. Psychological science 18, 2 (2007), 165–171.
  34. The Illusion of Empathy? Notes on Displays of Emotion in Human-Computer Interaction. In ACM Conference on Human Factors in Computing Systems (CHI).
  35. Self-E: Smartphone-Supported Guidance for Customizable Self-Experimentation. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 227, 13 pages. https://doi.org/10.1145/3411764.3445100
  36. From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models. arXiv preprint arXiv:2311.13063 (2023).
  37. Mapping and taking stock of the personal informatics literature. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 4 (2020), 1–38.
  38. Centers for Disease Control and Prevention. 2022. Physical Activity. https://www.cdc.gov/physicalactivity/index.html
  39. Google. 2024. Gemma - A Family of Lightweight, State-of-the Art Open Models from Google. https://ai.google.dev/gemma
  40. How information visualization novices construct visualizations. IEEE transactions on visualization and computer graphics 16, 6 (2010), 943–952.
  41. Health Level Seven International. 2023. HL7 FHIR Release 4. https://www.hl7.org/fhir/. [Online; accessed 24-Jan-2024].
  42. Infusing behavior science into large language models for activity coaching. PLOS Digital Health 3, 4 (2024), e0000431.
  43. Leveraging mobile technology for public health promotion: A multidisciplinary perspective. Annual Review of Public Health 44 (2023), 131–150.
  44. Kate S Hone and Robert Graham. 2000. Towards a tool for the subjective assessment of speech system interfaces (SASSI). Natural Language Engineering 6, 3-4 (2000), 287–303.
  45. Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 159–166.
  46. Helping the helper: Supporting peer counselors via ai-empowered practice and feedback. arXiv preprint arXiv:2305.08982 (2023).
  47. A Field Study of On-Calendar Visualizations. In Proceedings of the 42nd Graphics Interface Conference. 13–20.
  48. Technology probes: inspiring design for and with families. In Proceedings of the SIGCHI conference on Human factors in computing systems. 17–24.
  49. Self-tracking behaviour in physical activity: a systematic review of drivers and outcomes of fitness tracking. Behaviour & Information Technology 41, 2 (2022), 242–261.
  50. Bart A Kamphorst. 2017. E-coaching systems: What they are, and what they aren’t. Personal and Ubiquitous Computing 21, 4 (2017), 625–632.
  51. Personal informatics, self-insight, and behavior change: A critical review of current literature. Human–Computer Interaction 32, 5-6 (2017), 268–296.
  52. Health-llm: Large language models for health prediction via wearable sensor data. arXiv preprint arXiv:2401.06866 (2024).
  53. Effects of counseling by peer human advisors vs computers to increase walking in underserved populations: The COMPASS randomized clinical trial. JAMA internal medicine 180, 11 (2020), 1481–1490.
  54. Ongoing physical activity advice by humans versus computers: the Community Health Advice by Telephone (CHAT) trial. Health Psychology 26, 6 (2007), 718.
  55. Exercise advice by humans versus computers: maintenance effects at 18 months. Health Psychology 33, 2 (2014), 192.
  56. How to evaluate technologies for health behavior change in HCI research. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). Association for Computing Machinery, New York, NY, USA, 3063–3072. https://doi.org/10.1145/1978942.1979396
  57. Reflection companion: a conversational system for engaging users in reflection on physical activity. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 2 (2018), 1–26.
  58. Language generation models can cause harm: So what can we do about it? An actionable survey. arXiv preprint arXiv:2210.07700 (2022).
  59. Bewell: A smartphone application to monitor, model and promote wellbeing. In 5th international ICST conference on pervasive computing technologies for healthcare.
  60. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. CoRR abs/2005.11401 (2020). arXiv:2005.11401 https://arxiv.org/abs/2005.11401
  61. A stage-based model of personal informatics systems. In Proceedings of the SIGCHI conference on human factors in computing systems. 557–566.
  62. Daniel Lieberman. 2021. Exercised: Why something we never evolved to do is healthy and rewarding. Vintage.
  63. Fish’n’Steps: Encouraging physical activity with an interactive computer game. In UbiComp 2006: Ubiquitous Computing: 8th International Conference, UbiComp 2006 Orange County, CA, USA, September 17-21, 2006 Proceedings 8. Springer, 261–278.
  64. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics 12 (2024), 157–173.
  65. Large Language Models are Few-Shot Health Learners. arXiv:2305.15525 [cs.CL]
  66. Evaluating the Experience of LGBTQ+ People Using Large Language Model Based Chatbots for Mental Health Support. arXiv preprint arXiv:2402.09260 (2024).
  67. Understanding the benefits and challenges of using large language model-based conversational agents for mental well-being support. In AMIA Annual Symposium Proceedings, Vol. 2023. American Medical Informatics Association, 1105.
  68. Adaptive Interventions with User-Defined Goals for Health Behavior Change. arXiv preprint arXiv:2311.09483 (2023).
  69. Self-efficacy and the stages of exercise behavior change. Research quarterly for exercise and sport 63, 1 (1992), 60–66.
  70. On faithfulness and factuality in abstractive summarization. arXiv preprint arXiv:2005.00661 (2020).
  71. AffectAura: an intelligent system for emotional memory. In Proceedings of the SIGCHI conference on human factors in computing systems. 849–858.
  72. Language Models Still Struggle to Zero-shot Reason about Time Series. arXiv preprint arXiv:2404.11757 (2024).
  73. William R Miller and Stephen Rollnick. 2012. Motivational interviewing: Helping people change. Guilford press.
  74. Automated vs. human health coaching: exploring participant and practitioner experiences. Proceedings of the ACM on human-computer interaction 5, CSCW1 (2021), 1–37.
  75. Evaluation of a personalized coaching system for physical activity: user appreciation and adherence. In Proceedings of the 11th EAI International Conference on Pervasive Computing Technologies for Healthcare (Barcelona, Spain) (PervasiveHealth ’17). Association for Computing Machinery, New York, NY, USA, 315–324. https://doi.org/10.1145/3154862.3154933
  76. Assessing the integrity of motivational interviewing interventions: Reliability of the motivational interviewing skills code. Behavioural and Cognitive Psychotherapy 31, 2 (2003), 177–184.
  77. Sean A Munson and Sunny Consolvo. 2012. Exploring goal-setting, rewards, self-monitoring, and sharing to motivate physical activity. In 2012 6th international conference on pervasive computing technologies for healthcare (pervasivehealth) and workshops. IEEE, 25–32.
  78. Designing ambient narrative-based interfaces to reflect and motivate physical activity. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–14.
  79. Just-in-time adaptive interventions (JITAIs) in mobile health: key components and design principles for ongoing health behavior support. Annals of Behavioral Medicine (2018), 1–17.
  80. Jeanette M Olsen and Bonnie J Nesbitt. 2010. Health coaching to improve healthy lifestyle behaviors: an integrative review. American journal of health promotion 25, 1 (2010), e1–e12.
  81. Advancing Health Coaching: A Comparative Study of Large Language Model and Health Coaches. SSRN Pre-print (2024).
  82. ONVY. 2024. https://www.onvy.health.
  83. World Health Organization. 2022. Physical Activity Fact Sheet. https://www.who.int/news-room/fact-sheets/detail/physical-activity
  84. World Health Organization. 2024. S.A.R.A.H, a Smart AI Resource Assistant for Health. https://www.who.int/campaigns/s-a-r-a-h.
  85. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  86. What makes a good counselor? learning to distinguish between high-quality and low-quality counseling conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 926–935.
  87. Amon Rapp and Federica Cena. 2016. Personal informatics for everyday life: How users without prior self-tracking experience engage with personal data. International Journal of Human-Computer Studies 94 (2016), 1–17.
  88. Byron Reeves and Clifford Nass. 1996. The media equation: How people treat computers, television, and new media like real people. Cambridge, UK 10, 10 (1996).
  89. The Influence of Personal Health Data on the Health Coaching Process. Frontiers in big Data 5 (2022), 678061.
  90. Beyond behavior: the coach’s perspective on technology in health coaching. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.
  91. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 (2023).
  92. Spezi. https://doi.org/10.5281/zenodo.10482368
  93. Modeling motivational interviewing strategies on an online peer-to-peer counseling platform. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–24.
  94. Rehearsal: Simulating conflict to teach conflict resolution. arXiv preprint arXiv:2309.12309 (2023).
  95. Grounding or guesswork? large language models are presumptive grounders. arXiv preprint arXiv:2311.09144 (2023).
  96. Human–AI collaboration enables more empathic conversations in text-based peer-to-peer mental health support. Nature Machine Intelligence 5, 1 (2023), 46–57.
  97. Retrieval Augmentation Reduces Hallucination in Conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, Punta Cana, Dominican Republic, 3784–3803. https://doi.org/10.18653/v1/2021.findings-emnlp.320
  98. Systematic review and meta-analysis of the effectiveness of chatbots on lifestyle behaviours. npj Digital Medicine 6, 1 (2023), 118.
  99. Reflective practicum: A framework of sensitising concepts to design for transformative reflection. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2696–2707.
  100. Visual mementos: Reflecting memories with personal data. IEEE transactions on visualization and computer graphics 22, 1 (2015), 369–378.
  101. Towards Conversational Diagnostic AI. arXiv:2401.05654 [cs.AI]
  102. Increasing physical activity using an just-in-time adaptive digital assistant supported by machine learning: a novel approach for hyper-personalised mHealth interventions. Journal of Biomedical Informatics 144 (2023), 104435.
  103. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359 (2021).
  104. WHOOP. 2023. Introducing WHOOP Coach, Powered By OpenAI. https://www.whoop.com/us/en/thelocker/introducing-whoop-coach-powered-by-openai/.
  105. Results of the first year of active for life: translation of 2 evidence-based physical activity programs for older adults into community settings. American Journal of Public Health 96, 7 (2006), 1201–1209.
  106. Active for life: final results from the translation of two physical activity programs. American journal of preventive medicine 35, 4 (2008), 340–351.
  107. A systematic review of the literature on health and wellness coaching: defining a key behavioral intervention in healthcare. Global advances in health and medicine 2, 4 (2013), 38–57.
  108. Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–22.
  109. Herding AI cats: Lessons from designing a chatbot by prompting GPT-3. In Proceedings of the 2023 ACM Designing Interactive Systems Conference. 2206–2220.
  110. ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL. arXiv preprint arXiv:2402.19446 (2024).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Matthew Jörke (3 papers)
  2. Shardul Sapkota (3 papers)
  3. Lyndsea Warkenthien (1 paper)
  4. Niklas Vainio (1 paper)
  5. Paul Schmiedmayer (6 papers)
  6. Emma Brunskill (86 papers)
  7. James Landay (5 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com