Robot Data Formats: HDF5, RLDS & LeRobot

Which episode format should you use for imitation learning and VLA training? A practical comparison.

Why the Format Matters

Robot learning datasets contain multi-modal time-series data: joint positions, images, actions, language instructions, and metadata. How you structure and serialize that data determines training speed, portability, and whether your dataset can be reused by the wider community. Three formats dominate the ecosystem today.

Format Overview

FeatureHDF5RLDS (TFDS)LeRobot
File format.hdf5 (binary, hierarchical)TFRecord shardsParquet + video files
Episode structureGroups per episode, datasets per modalityNested dict per step, episodes as sequencesFlat table with episode index column
Image storageEmbedded as byte arrays or compressed datasetsEmbedded in TFRecord as encoded bytesSeparate MP4 video files (one per camera)
Random accessExcellent (native HDF5 chunking)Poor (sequential TFRecord reads)Good (Parquet row-group seeking)
StreamingRequires full downloadSupports streaming via TFDSSupports streaming via Hugging Face Hub
FrameworkFramework-agnostic (h5py, PyTables)TensorFlow-native; usable in JAXPyTorch-native; Hugging Face ecosystem
Community adoptionrobomimic, MimicGen, RoboCasaOpen X-Embodiment, RT-X, Bridge V2LeRobot, Hugging Face Hub datasets

HDF5

HDF5 is the legacy standard in robotics research. Each episode is a group containing datasets for observations, actions, and rewards. The format supports compression (gzip, lzf) and chunked I/O, making it efficient for random-access training loops. It is framework-agnostic and supported by nearly every programming language.

Best for: Single-lab projects, robomimic-compatible pipelines, and when you need fast random access to individual timesteps.

RLDS (TFDS)

RLDS (Reinforcement Learning Datasets) is a specification built on TensorFlow Datasets. It represents episodes as sequences of steps, each step being a nested dictionary. The Open X-Embodiment dataset—the largest multi-robot dataset to date—uses RLDS, making it the de facto format for large-scale cross-embodiment training.

Best for: Contributing to or training on Open X-Embodiment, using JAX/TensorFlow pipelines, and datasets that need to be streamed from cloud storage.

LeRobot Format

LeRobot stores episodes as Parquet tables (one row per timestep) with images saved as separate MP4 video files. This design optimizes for the Hugging Face Hub: datasets can be streamed, versioned, and previewed in the browser. The LeRobot Python library handles recording, replay, and training in a unified workflow.

Best for: PyTorch-based training, sharing datasets on Hugging Face Hub, projects using SO-100/SO-101/OpenArm hardware, and teams that want an end-to-end record-train-deploy workflow.

Converting Between Formats

Conversion scripts exist for all three pairings:

  • HDF5 to RLDS: The rlds_dataset_builder tool from Google converts HDF5 episodes into TFDS-compatible RLDS shards.
  • RLDS to LeRobot: The lerobot CLI includes lerobot convert to import RLDS datasets into Parquet + video format.
  • HDF5 to LeRobot: Use lerobot convert --from-hdf5 for a direct path, or go through RLDS as an intermediate step.

SVRC Recommendations

For new projects starting from scratch, we recommend the LeRobot format for its ease of use, Hub integration, and growing community. If you need to participate in the Open X-Embodiment ecosystem, export a parallel copy in RLDS. For legacy compatibility with robomimic, keep an HDF5 version.

SVRC's data collection services can deliver datasets in any of these three formats. Our dataset catalog lists publicly available datasets with format metadata, and the SVRC Platform provides tools for browsing, converting, and annotating robot data.

Further Reading