568e15ef创建于 2025年11月19日历史提交

license: cc-by-4.0 task_categories:

  • robotics tags:
  • unitree-g1
  • pick-and-place
  • simulation
  • curobo
  • depth-perception
  • rgbd size_categories:
  • 100K<n<1M language:
  • en pretty_name: Unitree G1 Apple Pick and Place with Depth Dataset

Unitree G1 Apple Pick and Place with Depth Dataset

Front View
Front View (Global)
Side View
Side View (Profile)
Top-Down View
Top-Down View (Bird's Eye)
Ego-Centric View
Ego-Centric View (Robot POV)
Multi-view perspectives of the Unitree G1 performing the pick-and-place task.

Depth Data Visualization

To aid in understanding the raw depth values, here we provide a side-by-side comparison of a normalized depth image (for visual clarity) and its corresponding RGB frame.

Normalized Depth Image
Normalized Depth Image
(for visualization)
Corresponding RGB Image
Corresponding RGB Image
(from `rs_view` camera)
A sample depth frame (normalized to 0-255 for grayscale visualization) and its synchronized RGB counterpart.

## Dataset Description

The Unitree G1 Apple Pick and Place with Depth Dataset contains 963 high-quality trajectories with per-frame depth images for RGB-D manipulation research. The robot picks up a red apple and places it into a bowl using bilateral arms and tri-finger hands. All trajectories include synchronized depth measurements from a head-mounted camera.

Key Features:

  • 963 successful trajectories with depth data
  • 256×256 depth images per timestep (277,592 total frames)
  • 28-DOF control: bilateral arms (7+7) + dexterous hands (7+7)
  • 256×256 RGB video at 20 FPS (ego view)
  • CuRobo motion planning (collision-free trajectories)
  • MuJoCo + RoboCasa simulation with realistic depth rendering

This dataset extends the base dataset with depth perception for RGB-D manipulation and 3D scene understanding research.

Note on Data Availability: To maintain accessibility within storage limits, the depth data hosted here is a subset containing 10 sample episodes. This allows users to verify the data structure and quality. The full dataset containing depth maps for all 963 trajectories is archived separately. If you need the complete dataset for training large-scale models, please refer to the Contact section below.

Dataset Owner

Junsung Park (@jnsungp)

License

Creative Commons Attribution 4.0 International (CC BY 4.0)

Dataset Format

Modality Type Shape Description
Observation State float32 (28,) Joint positions (radians) for arms + hands
Observation Depth float32 (256, 256) Depth image (meters) from rs_view camera
Action float32 (28,) Target joint positions
Video RGB (256, 256, 3) Ego view, 20 FPS, H.264
Language string - "Pick up the red apple and place it on the bowl"

Depth Image Specification

Depth measurements captured from the robot's head-mounted camera:

Property Value Description
Resolution 256 × 256 Width × Height in pixels
Data Type float32 32-bit floating point
Units Meters (m) Distance from camera to surface
Camera rs_view Head-mounted RGB-D camera
Format .npy NumPy binary format
Range ~0.3m to 5.0m Typical depth range in scene

Loading Depth Data:

import numpy as np

# Load single depth frame
depth = np.load("depth/chunk-000/episode_000000/frame_000050.npy")
print(depth.shape)  # (256, 256)
print(f"Min depth: {depth.min():.2f}m, Max depth: {depth.max():.2f}m")

Path Template:

depth/chunk-{episode_chunk:03d}/episode_{episode_index:06d}/frame_{frame_index:06d}.npy

Joint Configuration (28-DOF)

Body Part DOF Description
Left Arm 7 Shoulder (3) + Elbow (1) + Wrist (3)
Right Arm 7 Shoulder (3) + Elbow (1) + Wrist (3)
Left Hand 7 Index (2) + Middle (2) + Thumb (3)
Right Hand 7 Index (2) + Middle (2) + Thumb (3)

Dataset Statistics

  • Trajectories: 963
  • Total Frames: 277,592
  • Avg Episode Length: ~288 frames (~14.4 seconds)
  • Episode Length Range: 180-400 frames
  • Storage Size: ~2.5 GB (data + videos + depth)
  • Success Rate: 100%

Download

huggingface-cli download \
    --repo-type dataset jnsungp/unitree-g1-robocasa-pick-apple-bowl-depth-1k \
    --local-dir ./datasets/g1-depth

Using Python

from datasets import load_dataset

dataset = load_dataset("jnsungp/unitree-g1-robocasa-pick-apple-bowl-depth-1k")

Dataset Structure

dataset_depth_1k/
├── data/
│   └── chunk-000/
│       ├── episode_000000.parquet
│       ├── episode_000001.parquet
│       └── ...
├── videos/
│   └── chunk-000/
│       └── observation.images.ego_view/
│           ├── episode_000000.mp4
│           ├── episode_000001.mp4
│           └── ...
├── depth/
│   └── chunk-000/
│       ├── episode_000000/
│       │   ├── frame_000000.npy
│       │   ├── frame_000001.npy
│       │   └── ...
│       ├── episode_000001/
│       └── ...
├── meta/
│   ├── info.json          # Dataset metadata
│   ├── stats.json         # Statistics (mean, std, min, max)
│   ├── tasks.jsonl        # Task descriptions
│   └── episodes.jsonl     # Episode information
└── README.md

Loading Data Example

import pandas as pd
import numpy as np
import cv2
import matplotlib.pyplot as plt

# Load trajectory data
df = pd.read_parquet("data/chunk-000/episode_000000.parquet")

# Access data
observations = df['observation.state'].values  # (N, 28) - joint positions
actions = df['action'].values                  # (N, 28) - target positions

# Load RGB video
cap = cv2.VideoCapture("videos/chunk-000/observation.images.ego_view/episode_000000.mp4")

# Load depth images
episode_idx = 0
frame_idx = 100

depth = np.load(f"depth/chunk-000/episode_{episode_idx:06d}/frame_{frame_idx:06d}.npy")

# Visualize depth
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(depth, cmap='turbo')
plt.colorbar(label='Depth (m)')
plt.title('Depth Image')

plt.subplot(1, 2, 2)
# Read corresponding RGB frame
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
ret, rgb = cap.read()
plt.imshow(cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB))
plt.title('RGB Image')
plt.show()

cap.release()

Use Cases

1. RGB-D Manipulation

Train policies that leverage depth information for:

  • Precise 3D localization of objects
  • Distance-aware grasping
  • Occlusion-robust perception

2. 3D Scene Understanding

  • Point cloud generation from RGB-D pairs
  • 3D object detection and segmentation
  • Spatial reasoning for manipulation

3. Depth-Aware Policy Learning

  • Multi-modal learning (RGB + Depth)
  • Improved generalization with geometric cues
  • Robustness to lighting variations

4. Sim-to-Real Transfer

  • Fine-tune models with realistic depth sensing
  • Domain adaptation with geometric constraints
  • Depth-based safety checks

Technical Details

Simulation:

  • Platform: MuJoCo + RoboCasa
  • Robot: Unitree G1 (upper body)
  • Hands: Dex31 tri-finger hands
  • Depth Rendering: MuJoCo native depth rendering

Motion Planning:

  • CuRobo (GPU-accelerated)
  • Collision-free trajectories
  • Smooth cubic interpolation

Depth Sensing:

  • Camera: Head-mounted RGB-D sensor (rs_view)
  • Resolution: 256×256 pixels
  • Format: 32-bit float, meters
  • Per-frame depth synchronized with RGB

Comparison with Base Dataset

Feature Base Dataset Depth Dataset
Trajectories 957 963
Joint State ✓ (28D) ✓ (28D)
RGB Video
Depth Images ✓ (256×256)
Use Case Vision-based manipulation RGB-D 3D manipulation

Citation

@dataset{park2025unitree_g1_depth,
  title={Unitree G1 Apple Pick and Place with Depth Dataset},
  author={Park, Junsung},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/datasets/jnsungp/unitree-g1-robocasa-pick-apple-bowl-depth-1k}
}

Acknowledgments

Built with CuRobo, RoboCasa, MuJoCo, and Unitree G1.


Version: 1.0 | Last Updated: November 19, 2025

Contact & Full Dataset Access

For questions, issues, or to request the full depth dataset (963 episodes):

Please include your affiliation when requesting the full dataset.