Artificial Development Set in SEDRo
- Artificial development set is a simulated multi-stage dataset that replicates early human sensorimotor and cognitive experiences using multi-modal inputs and Unity 3D.
- It underpins self-supervised learning by employing curriculum-based training across developmental stages without externally imposed rewards.
- Embedded evaluation metrics and intrinsic energy-based rewards facilitate continuous open-ended exploration and model generalization.
An artificial development set is a structured, multi-stage corpus generated within a simulated environment to emulate the sensorimotor and cognitive experiences of early human development, specifically for training and assessing self-supervised agents. In the context of SEDRo (Simulated Environment for Developmental Robotics), the artificial development set, denoted as , systematically catalogs the multi-modal state-action trajectories accrued by an embodied agent as it undergoes virtual “infancy” modeled on developmental psychology, without the reliance on extrinsic task-based rewards or curated datasets (Islam et al., 2020).
1. Architecture of the Developmental Simulation
SEDRo leverages Unity 3D as the physics engine, supporting rigid-body and soft-body interactions to replicate contact, gravity, collisions, and articulated kinematics. The environment’s components—including the agent’s crib, toys, interactive screens, and a caregiver avatar—adhere to the same physical rules and can be modified or activated in accordance with the agent’s “age.” The simulated agent possesses 53 degrees of freedom (DoF) for bodily articulation and 9 DoF for ocular motion, with the action vector segmenting into ocular angles and joint-torque commands. Sensory streams include foveal and peripheral RGB vision, densely distributed tactile contact flags, proprioceptive joint angles and velocities, inertial measurements, and an energy variable that models internal physiological need.
The overall agent-environment state at timestep is:
with the agent action defined as:
There is no externally dictated reward; instead, the intrinsic reward at any is measured by the change in internal energy:
Positive (reward) occurs when fed; negative accrues otherwise.
2. Developmental Staging and Protocols
SEDRo decomposes the agent’s infancy into five sequential stages, molding the complexity and range of available body parts, sensors, and environmental affordances:
| Stage | Age Period | Capabilities Unlocked |
|---|---|---|
| 0 | Fetus (0–36 wks) | 10 DoF, no vision, proprioception, womb contacts |
| 1 | Newborn (0–3 mo) | 53 DoF, poor acuity foveal vision, static crib, caregiver head |
| 2 | 3–6 months | Improved vision, grasping toys, sensorimotor contingencies |
| 3 | 6–9 months | Crawling module, obstacle floor, object permanence tasks |
| 4 | 9–12 months | Joint attention, search-and-find games |
Progress from stage to grants the agent new sensory-motor tools and environmental structures, paralleling the gradual increase in sensorimotor richness observed in human infants.
Developmental protocols, based in large part on empirical psychology, include:
- Visual fixation: Tracking a moving caregiver’s face, with success defined as eye gaze within ±10° of target for >90% of trial duration.
- Sensorimotor reach: Touching a toy within 5 s, success when .
- Rod-and-box task: Measuring habituation and post-habituation gaze-time preferences (ΔG), mirroring signatures of object unity development.
- Peek-a-boo task: Testing gaze return to a toy’s hidden location, modeling object permanence.
3. Data Modalities, Sampling, and Corpus Formation
The artificial development set comprises all sessions recorded throughout the staged simulation, captured with diverse temporal and spatial resolution per modality:
- Vision: RGB at 30 Hz (foveal , peripheral )
- Proprioception, inertial: 200 Hz
- Touch: event-driven, recorded at 100 Hz
- Internal state: 10 Hz
The dataset at stage is
and the total artificial development set:
Corpora are stored as serialized NumPy arrays or TFRecords, maintaining multi-modal synchronization and stage annotation.
4. Training Curricula for Self-Supervised Agents
The artificial development set is tailored for curriculum-based self-supervised model building. The recommended workflow is:
- Pretraining: Encoder is pretrained on to reconstruct proprio-tactile sequences, capitalizing on dense sensorimotor data.
- Predictive Modeling: Subsequent fine-tuning on and for next-state prediction using learned representations, forming a mapping .
- Multi-modal Contrastive Objectives: Transfer learning to and supports alignment and prediction across vision, touch, and proprioception.
This sequence enforces a structured progression of model complexity analogous to human developmental maturations.
5. Embedded Evaluation Metrics
Formal evaluation in SEDRo leverages task and developmental metrics:
- Task success rates: Ratio of reach/grasp events (where ) to total attempts within the prescribed time window; proportion of gaze fixation time within ±ε of fixation target.
- Learning curves: Task-success rates plotted as a function of agent training iterations, e.g., reach-accuracy improving from 30% to 80% over steps.
- Statistical benchmarks: Agent behavior—such as the sign and magnitude of ΔG in the rod-and-box task—is directly compared to human infant benchmarks at defined ages (mean ±σ), with -values quantifying alignment within the human 95% confidence interval.
All evaluation protocols are embedded and conducted in-line, reinforcing unsupervised, developmental-psychology-grounded performance analysis over task-specific supervised metrics.
6. Functional Role in Self-Supervised Learning
Distinctively, SEDRo and its artificial development set lack externally imposed task rewards or narrowly defined learning targets. Key features include:
- Open-ended exploration: The agent is permitted continual free exploration, with no terminal states or hand-defined task boundaries.
- Multi-modal data stream: Continuous sensory input across vision, touch, proprioception, and internal states supports cross-modal alignment and representation learning.
- Intrinsic body goals: The only explicit reward is energy maintenance, motivating self-directed behavioral strategies to prevent simulated starvation.
- Curriculum-driven complexity: Staged progression of datasets mirrors the ontogeny of sensorimotor and cognitive development, challenging models to generalize and adapt incrementally.
- Unsupervised embedding of evaluation: All embedded tests, including gaze statistics and object-permanence responses, are evaluative yet do not prescribe explicit classification or policy objectives.
Together, these properties instantiate the artificial development set as a foundation for domain-general, self-supervised model formation, in contrast to the limitations of highly curated, task-specific reinforcement learning or supervised paradigms (Islam et al., 2020).