Ground-truth validation of CoV-based autonomy classification
Establish controlled experiments and ground-truth labeled datasets in which the operational mode of AI agents on Moltbook-like platforms (autonomous heartbeat scheduling versus human prompting) is known, to directly validate and calibrate the coefficient-of-variation-based temporal classification for detecting human influence.
References
Our classification framework is validated primarily through the natural experiment (44- hour shutdown) and the sliding window temporal dynamics, but lacks ground truth labels of known human-prompted versus autonomous posts. This signal independence means we cannot cross-validate temporal classification against content features; each signal provides complementary rather than convergent evidence.
— The Moltbook Illusion: Separating Human Influence from Emergent Behavior in AI Agent Societies
(2602.07432 - Li, 7 Feb 2026) in Limitations