Age-related performance and calibration disparities across countries
Ascertain how large language model agent performance and Human–LLM calibration disparities vary with user age across countries and cultural contexts beyond the United States in multi-turn, tool-use agent evaluations.
References
In addition, our age-based analyses are limited to users in the United States due to recruitment constraints, leaving open the question of how performance and calibration disparities vary with age across other countries and cultural contexts.
— Lost in Simulation: LLM-Simulated Users are Unreliable Proxies for Human Users in Agentic Evaluations
(2601.17087 - Seshadri et al., 23 Jan 2026) in Limitations