Robot Data Formats: HDF5, RLDS & LeRobot
Which episode format should you use for imitation learning and VLA training? A practical comparison.
Why the Format Matters
Robot learning datasets contain multi-modal time-series data: joint positions, images, actions, language instructions, and metadata. How you structure and serialize that data determines training speed, portability, and whether your dataset can be reused by the wider community. Three formats dominate the ecosystem today.
Format Overview
| Feature | HDF5 | RLDS (TFDS) | LeRobot |
|---|---|---|---|
| File format | .hdf5 (binary, hierarchical) | TFRecord shards | Parquet + video files |
| Episode structure | Groups per episode, datasets per modality | Nested dict per step, episodes as sequences | Flat table with episode index column |
| Image storage | Embedded as byte arrays or compressed datasets | Embedded in TFRecord as encoded bytes | Separate MP4 video files (one per camera) |
| Random access | Excellent (native HDF5 chunking) | Poor (sequential TFRecord reads) | Good (Parquet row-group seeking) |
| Streaming | Requires full download | Supports streaming via TFDS | Supports streaming via Hugging Face Hub |
| Framework | Framework-agnostic (h5py, PyTables) | TensorFlow-native; usable in JAX | PyTorch-native; Hugging Face ecosystem |
| Community adoption | robomimic, MimicGen, RoboCasa | Open X-Embodiment, RT-X, Bridge V2 | LeRobot, Hugging Face Hub datasets |
HDF5
HDF5 is the legacy standard in robotics research. Each episode is a group containing datasets for observations, actions, and rewards. The format supports compression (gzip, lzf) and chunked I/O, making it efficient for random-access training loops. It is framework-agnostic and supported by nearly every programming language.
Best for: Single-lab projects, robomimic-compatible pipelines, and when you need fast random access to individual timesteps.
RLDS (TFDS)
RLDS (Reinforcement Learning Datasets) is a specification built on TensorFlow Datasets. It represents episodes as sequences of steps, each step being a nested dictionary. The Open X-Embodiment dataset—the largest multi-robot dataset to date—uses RLDS, making it the de facto format for large-scale cross-embodiment training.
Best for: Contributing to or training on Open X-Embodiment, using JAX/TensorFlow pipelines, and datasets that need to be streamed from cloud storage.
LeRobot Format
LeRobot stores episodes as Parquet tables (one row per timestep) with images saved as separate MP4 video files. This design optimizes for the Hugging Face Hub: datasets can be streamed, versioned, and previewed in the browser. The LeRobot Python library handles recording, replay, and training in a unified workflow.
Best for: PyTorch-based training, sharing datasets on Hugging Face Hub, projects using SO-100/SO-101/OpenArm hardware, and teams that want an end-to-end record-train-deploy workflow.
Converting Between Formats
Conversion scripts exist for all three pairings:
- HDF5 to RLDS: The
rlds_dataset_buildertool from Google converts HDF5 episodes into TFDS-compatible RLDS shards. - RLDS to LeRobot: The
lerobotCLI includeslerobot convertto import RLDS datasets into Parquet + video format. - HDF5 to LeRobot: Use
lerobot convert --from-hdf5for a direct path, or go through RLDS as an intermediate step.
SVRC Recommendations
For new projects starting from scratch, we recommend the LeRobot format for its ease of use, Hub integration, and growing community. If you need to participate in the Open X-Embodiment ecosystem, export a parallel copy in RLDS. For legacy compatibility with robomimic, keep an HDF5 version.
SVRC's data collection services can deliver datasets in any of these three formats. Our dataset catalog lists publicly available datasets with format metadata, and the SVRC Platform provides tools for browsing, converting, and annotating robot data.