Hierarchical temporal representation for routines and transitions

Develop hierarchical temporal representation methods for sensor-based Human Activity Recognition foundation models that encode routines and activity transitions in a form that is stable across users and devices to support long-horizon activity understanding.

Background

The paper argues that current foundation models for sensor-based human activity recognition inadequately capture long-horizon and hierarchical temporal structures of human behavior. Activities occur across multiple time scales and involve transitions that vary across users and devices, challenging robustness and transfer. The authors identify the need for principled representations that remain stable under user and device variability while modeling routines and transitions.

References

Key open questions include how to encode routines and transitions in a form that remains stable across users and devices, and how to align sensor embeddings with language without overfitting to prompts or captions.

Foundation Models Defining A New Era In Sensor-based Human Activity Recognition: A Survey And Outlook  (2604.02711 - Bian et al., 3 Apr 2026) in Subsection 6.2, Representation Limits: Temporal Hierarchy and Semantic Abstraction