Validating Behavioral Proxies for Disease Risk Monitoring via Large-Scale E-commerce Data
Abstract: Digital traces of everyday behavior, such as e-commerce (EC) purchase logs, provide scalable signals for population-level monitoring, yet their epidemiological validity remains unclear due to weak links to clinical outcomes. We propose a behavioral proxy for disease onset based on transitions from regular to therapeutic diets observed in EC purchase histories, and evaluate its validity through large-scale cross-domain analysis. Using EC purchase data (N = 55,645 users) and independent insurance-derived clinical records, we compare ingredient-level risk patterns and seasonal disease dynamics in feline lower urinary tract disease (FLUTD) as a case study. The proxy-based estimates show strong agreement with clinical data, with correlations of r = 0.74 for ingredient-level risk patterns and r = 0.82 for seasonal variation. Both data sources consistently capture elevated disease risk during winter months. Moreover, analysis using EC data alone reproduces established domain knowledge, including the association between higher wet food consumption and lower disease risk. Our results demonstrate that behavioral signals derived from large-scale EC data can serve as validated, cost-effective complements to traditional surveillance systems, and suggest broader applicability to monitoring lifestyle-related and chronic conditions.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.