LeRobotDataset Format
- LeRobotDataset Format is a standardized data structure for multimodal robotic trajectories, integrating sensors, actions, and video streams.
- It organizes data in a self-contained directory using JSON metadata, Parquet tables, and MP4 videos for enhanced data accessibility.
- The format ensures extensibility and backward compatibility with robust versioning and influences from legacy datasets like JRDB and modern datasets like DROID.
LeRobotDataset Format defines a comprehensive, extensible, and performance-oriented storage and interchange specification for multimodal robotic trajectory data and sensor streams, as standardized by the LeRobot library ecosystem (Cadene et al., 26 Feb 2026). The format is designed to support real-world robot learning by enabling scalable, asynchronous data collection, efficient batch training, multi-sensor fusion, and seamless streaming workflows. LeRobotDataset inherits design influences from legacy formats such as JRDB (MartÃn-MartÃn et al., 2019) and modern manipulation datasets like DROID (Khazatsky et al., 2024), while introducing a more generalized schema, robust versioning, and strong compatibility with contemporary analytics toolchains.
1. Directory Organization and Dataset Layout
A LeRobotDataset instance is composed as a self-contained directory which may reside locally or on remote storage (e.g., S3, NFS URI). Naming conventions employ lower-case "snake_case," and episode identifiers are zero-padded.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
dataset_root/
├─ manifest.json
├─ metadata.json
├─ version.txt
├─ episodes/
│ ├─ episode_00001/
│ │ ├─ sensors.parquet
│ │ ├─ actions.parquet
│ │ ├─ observations.parquet
│ │ ├─ camera_left.mp4
│ │ ├─ camera_right.mp4
│ │ └─ camera_meta.json
│ ├─ episode_00002/
│ │ └─ ...
│ └─ ...
└─ index/
├─ episode_index.parquet
└─ frame_index.parquet |
- manifest.json: Lists all episodes, checksums, modality keys, and schema version.
- metadata.json: Dataset-level metadata: robot model, environment, global sensor calibration.
- version.txt: Single-line semantic version (e.g., "1.2.0") redundantly documenting format version.
- episodes/episode_XXXXX/: One subdirectory per episode; all time-series and video streams per episode.
- index/: Provides episode and frame-level random access indices for efficient data retrieval.
This structure enables scalable storage, streaming, and hierarchical organization of large numbers of multimodal robot experiments.
2. Episode Schema and Data Modalities
Each episode aggregates temporally-aligned sensor, action, and observation streams as columnar Parquet tables, with optional compressed video files and metadata sidecars. Units are SI unless otherwise stated.
A) sensors.parquet
| Field | Type | Description |
|---|---|---|
| timestamp_ns | int64 | UNIX epoch in nanoseconds |
| joint_positions | float32[Nâ‚–] | radians (Nâ‚– = robot joint count) |
| joint_velocities | float32[Nâ‚–] | rad/s |
| joint_torques | float32[Nâ‚–] | Nm |
| tcp_pose_xyzq | float32[7] | [x,y,z in m, qâ‚“,qáµ§,q_z,q_w] |
| imu_accel_m_s2 | float32[3] | m/s², body-frame linear acceleration |
| imu_gyro_rad_s | float32[3] | rad/s, body-frame angular velocity |
B) actions.parquet
| Field | Type | Description |
|---|---|---|
| timestamp_ns | int64 | Alignment with sensors/obs |
| commanded_joint_positions | float32[Nâ‚–] | Radians (if available) |
| commanded_joint_torques | float32[Nâ‚–] | Nm |
| action_chunk_length | int32 | Horizon H (for chunked policies) |
| action_chunk | float32[H, A] | Policy chunk (H = steps, A = act dim) |
C) observations.parquet
| Field | Type | Description |
|---|---|---|
| timestamp_ns | int64 | Synchronization |
| teleop_buttons | bool[B] | Discrete teleoperation input mask |
| teleop_axes | float32[A_joy] | Axis values ∈ [–1,1] |
| gripper_state | bool or float32 | Binary or proportional gripper status |
D) camera_*.mp4
- H.264 encoded video, per camera, at dataset-specified FPS, synchronized via nearest-timestamp matching with frame_index.parquet.
This schema ensures unambiguous mapping between sensorimotor states, asynchronous policy actions, low-latency visual streams, and operator interventions.
3. Metadata and Calibration Conventions
All dataset-level and episode-level metadata is encoded in JSON for transparency and extensibility.
- metadata.json contains:
"dataset_id": canonical identifier"format_version": semantic version"robot": object withname,dof,vendor, and nested calibration dicts (joint_offset,link_lengths_m, camera intrinsics/extrinsics)"environment": scene identifier, object list, lighting conditions"sensors": array of sensor specs, per type (e.g., {"name":"imu", "type":..., "rate_hz":..., "units":...})"recording_config": video FPS, Parquet row group size
- episode_{id}/camera_meta.json provides per-camera parameters: name, resolution, intrinsics, extrinsics, compression, FPS.
- Episode-level Parquet metadata or JSON includes
episode_id,task_label,description, ISO 8601 time bounds, and optional annotation labels.
These conventions are compatible with multi-episode data fusion, reproducible calibration, and experiment tracking.
4. Formal Structure and Synchronization
The dataset formalism is specified via Protobuf, modeling each recorded Frame as a tuple:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
message Frame {
int64 timestamp_ns = 1;
repeated float joint_positions = 2;
repeated float joint_velocities = 3;
repeated float joint_torques = 4;
repeated float tcp_pose_xyzq = 5;
optional Image rgb_image = 6;
optional Image depth_image = 7;
map<string,bytes> extra_sensors = 8;
}
message Episode {
string episode_id = 1;
repeated Frame frames = 2;
string task_label = 3;
map<string,string> meta = 4;
}
message Dataset {
string format_version = 1;
repeated Episode episodes = 2;
map<string,bytes> dataset_meta = 3;
} |
In LaTeX:
where is the sensor state, is the observation, is the chunked action. The inference stack computes
where aggregates overlapping action chunks for robust asynchronous policy execution.
5. Storage, Indexing, and Performance
- Parquet Serialization: All tabular streams (sensors, actions, observations) are stored as columnar Parquet tables. Snappy (default) or Zstd compression is employed. Row group size is configurable (default 1,000 for fast batch iteration).
- Video Encoding: All camera streams use H.264 in MP4 containers, with
tune=zerolatencyfor streaming. - Indexing:
index/episode_index.parquetgives global episode lookup (episode_id → file, time, count).index/frame_index.parquetmaps sub-episode timestamp offsets to Parquet row indices and video timecodes for direct random access.
- Streaming Access:
- The APIs enable sequential iteration, random-access retrieval, and batched ingest for ML pipelines. Data can be consumed via disk or remote URI using PyTorch-compatible or asynchronous streaming loaders.
This architectural design ensures high-throughput, low-latency access even for petascale datasets, supporting both offline and online analytics.
6. Extensibility and Backward Compatibility
LeRobotDataset is versioned with top-level format_version fields in both manifest and metadata. The versioning logic follows:
- MINOR version bumps: May add optional fields (new JSON keys, additional Parquet columns, new sensor modalities), which are ignored by old readers.
- MAJOR version bumps: May change field names, types, or required layout.
- Adding modalities:
- New column (e.g.,
lidar_ranges) in sensors.parquet - Register in
"sensors"metadata - Optionally add new modality files (video, point-cloud, etc.) in episode directory
Code samples for instantiation, population, batch loading, and streaming are provided by the reference implementation, demonstrating both direct access and streaming workflows.
Protobuf and code APIs handle unrecognized fields gracefully, preserving forward compatibility.
7. Relation to Other Datasets and Standards
LeRobotDataset generalizes and subsumes design principles from multi-sensor datasets such as JRDB (MartÃn-MartÃn et al., 2019) (modular per-sensor directories, explicit calibration, annotation schema), and aligns with modern high-throughput manipulation datasets like DROID (Khazatsky et al., 2024) (per-trajectory folders, compressed state archives, direct ML-friendly ingest).
Significant departures include:
- Universal storage in columnar Parquet (rather than per-frame CSV or .npy), allowing for efficient SQL-style analytics and scalable ML ingest.
- Fully explicit calibration, episode-level and global metadata via JSON and YAML.
- Standard protocols for both high-frequency proprioception and low-frequency imagery under unified synchronization.
- Random-access indices for sub-episode and frame-level access, not present in most legacy formats.
A plausible implication is that LeRobotDataset provides a blueprint for future interoperability and benchmarking across real-world robot learning platforms, with scalable support for increasingly multimodal and asynchronous experimental regimes.
References:
- (Cadene et al., 26 Feb 2026): "LeRobot: An Open-Source Library for End-to-End Robot Learning"
- (MartÃn-MartÃn et al., 2019): "JRDB: A Dataset and Benchmark of Egocentric Robot Visual Perception of Humans in Built Environments"
- (Khazatsky et al., 2024): "DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset"