license: cc-by-4.0 task_categories:
- robotics tags:
- unitree-g1
- pick-and-place
- simulation
- curobo
- depth-perception
- rgbd size_categories:
- 100K<n<1M language:
- en pretty_name: Unitree G1 Apple Pick and Place with Depth Dataset
Unitree G1 Apple Pick and Place with Depth Dataset
Front View (Global) |
Side View (Profile) |
Top-Down View (Bird's Eye) |
Ego-Centric View (Robot POV) |
| Multi-view perspectives of the Unitree G1 performing the pick-and-place task. | |
Depth Data Visualization
To aid in understanding the raw depth values, here we provide a side-by-side comparison of a normalized depth image (for visual clarity) and its corresponding RGB frame.
Normalized Depth Image (for visualization) |
Corresponding RGB Image (from `rs_view` camera) |
| A sample depth frame (normalized to 0-255 for grayscale visualization) and its synchronized RGB counterpart. | |
## Dataset Description
The Unitree G1 Apple Pick and Place with Depth Dataset contains 963 high-quality trajectories with per-frame depth images for RGB-D manipulation research. The robot picks up a red apple and places it into a bowl using bilateral arms and tri-finger hands. All trajectories include synchronized depth measurements from a head-mounted camera.
Key Features:
- 963 successful trajectories with depth data
- 256×256 depth images per timestep (277,592 total frames)
- 28-DOF control: bilateral arms (7+7) + dexterous hands (7+7)
- 256×256 RGB video at 20 FPS (ego view)
- CuRobo motion planning (collision-free trajectories)
- MuJoCo + RoboCasa simulation with realistic depth rendering
This dataset extends the base dataset with depth perception for RGB-D manipulation and 3D scene understanding research.
Note on Data Availability: To maintain accessibility within storage limits, the depth data hosted here is a subset containing 10 sample episodes. This allows users to verify the data structure and quality. The full dataset containing depth maps for all 963 trajectories is archived separately. If you need the complete dataset for training large-scale models, please refer to the Contact section below.
Dataset Owner
Junsung Park (@jnsungp)
License
Creative Commons Attribution 4.0 International (CC BY 4.0)
Dataset Format
| Modality | Type | Shape | Description |
|---|---|---|---|
| Observation State | float32 |
(28,) |
Joint positions (radians) for arms + hands |
| Observation Depth | float32 |
(256, 256) |
Depth image (meters) from rs_view camera |
| Action | float32 |
(28,) |
Target joint positions |
| Video | RGB | (256, 256, 3) |
Ego view, 20 FPS, H.264 |
| Language | string |
- | "Pick up the red apple and place it on the bowl" |
Depth Image Specification
Depth measurements captured from the robot's head-mounted camera:
| Property | Value | Description |
|---|---|---|
| Resolution | 256 × 256 | Width × Height in pixels |
| Data Type | float32 |
32-bit floating point |
| Units | Meters (m) | Distance from camera to surface |
| Camera | rs_view |
Head-mounted RGB-D camera |
| Format | .npy |
NumPy binary format |
| Range | ~0.3m to 5.0m | Typical depth range in scene |
Loading Depth Data:
import numpy as np
# Load single depth frame
depth = np.load("depth/chunk-000/episode_000000/frame_000050.npy")
print(depth.shape) # (256, 256)
print(f"Min depth: {depth.min():.2f}m, Max depth: {depth.max():.2f}m")
Path Template:
depth/chunk-{episode_chunk:03d}/episode_{episode_index:06d}/frame_{frame_index:06d}.npy
Joint Configuration (28-DOF)
| Body Part | DOF | Description |
|---|---|---|
| Left Arm | 7 | Shoulder (3) + Elbow (1) + Wrist (3) |
| Right Arm | 7 | Shoulder (3) + Elbow (1) + Wrist (3) |
| Left Hand | 7 | Index (2) + Middle (2) + Thumb (3) |
| Right Hand | 7 | Index (2) + Middle (2) + Thumb (3) |
Dataset Statistics
- Trajectories: 963
- Total Frames: 277,592
- Avg Episode Length: ~288 frames (~14.4 seconds)
- Episode Length Range: 180-400 frames
- Storage Size: ~2.5 GB (data + videos + depth)
- Success Rate: 100%
Download
huggingface-cli download \
--repo-type dataset jnsungp/unitree-g1-robocasa-pick-apple-bowl-depth-1k \
--local-dir ./datasets/g1-depth
Using Python
from datasets import load_dataset
dataset = load_dataset("jnsungp/unitree-g1-robocasa-pick-apple-bowl-depth-1k")
Dataset Structure
dataset_depth_1k/
├── data/
│ └── chunk-000/
│ ├── episode_000000.parquet
│ ├── episode_000001.parquet
│ └── ...
├── videos/
│ └── chunk-000/
│ └── observation.images.ego_view/
│ ├── episode_000000.mp4
│ ├── episode_000001.mp4
│ └── ...
├── depth/
│ └── chunk-000/
│ ├── episode_000000/
│ │ ├── frame_000000.npy
│ │ ├── frame_000001.npy
│ │ └── ...
│ ├── episode_000001/
│ └── ...
├── meta/
│ ├── info.json # Dataset metadata
│ ├── stats.json # Statistics (mean, std, min, max)
│ ├── tasks.jsonl # Task descriptions
│ └── episodes.jsonl # Episode information
└── README.md
Loading Data Example
import pandas as pd
import numpy as np
import cv2
import matplotlib.pyplot as plt
# Load trajectory data
df = pd.read_parquet("data/chunk-000/episode_000000.parquet")
# Access data
observations = df['observation.state'].values # (N, 28) - joint positions
actions = df['action'].values # (N, 28) - target positions
# Load RGB video
cap = cv2.VideoCapture("videos/chunk-000/observation.images.ego_view/episode_000000.mp4")
# Load depth images
episode_idx = 0
frame_idx = 100
depth = np.load(f"depth/chunk-000/episode_{episode_idx:06d}/frame_{frame_idx:06d}.npy")
# Visualize depth
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(depth, cmap='turbo')
plt.colorbar(label='Depth (m)')
plt.title('Depth Image')
plt.subplot(1, 2, 2)
# Read corresponding RGB frame
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
ret, rgb = cap.read()
plt.imshow(cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB))
plt.title('RGB Image')
plt.show()
cap.release()
Use Cases
1. RGB-D Manipulation
Train policies that leverage depth information for:
- Precise 3D localization of objects
- Distance-aware grasping
- Occlusion-robust perception
2. 3D Scene Understanding
- Point cloud generation from RGB-D pairs
- 3D object detection and segmentation
- Spatial reasoning for manipulation
3. Depth-Aware Policy Learning
- Multi-modal learning (RGB + Depth)
- Improved generalization with geometric cues
- Robustness to lighting variations
4. Sim-to-Real Transfer
- Fine-tune models with realistic depth sensing
- Domain adaptation with geometric constraints
- Depth-based safety checks
Technical Details
Simulation:
- Platform: MuJoCo + RoboCasa
- Robot: Unitree G1 (upper body)
- Hands: Dex31 tri-finger hands
- Depth Rendering: MuJoCo native depth rendering
Motion Planning:
- CuRobo (GPU-accelerated)
- Collision-free trajectories
- Smooth cubic interpolation
Depth Sensing:
- Camera: Head-mounted RGB-D sensor (
rs_view) - Resolution: 256×256 pixels
- Format: 32-bit float, meters
- Per-frame depth synchronized with RGB
Comparison with Base Dataset
| Feature | Base Dataset | Depth Dataset |
|---|---|---|
| Trajectories | 957 | 963 |
| Joint State | ✓ (28D) | ✓ (28D) |
| RGB Video | ✓ | ✓ |
| Depth Images | ✗ | ✓ (256×256) |
| Use Case | Vision-based manipulation | RGB-D 3D manipulation |
Citation
@dataset{park2025unitree_g1_depth,
title={Unitree G1 Apple Pick and Place with Depth Dataset},
author={Park, Junsung},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/datasets/jnsungp/unitree-g1-robocasa-pick-apple-bowl-depth-1k}
}
Acknowledgments
Built with CuRobo, RoboCasa, MuJoCo, and Unitree G1.
Version: 1.0 | Last Updated: November 19, 2025
Contact & Full Dataset Access
For questions, issues, or to request the full depth dataset (963 episodes):
- Email: night1115@snu.ac.kr
- Hugging Face: @jnsungp
- Institution: Seoul National University
Please include your affiliation when requesting the full dataset.