568e15ef创建于 2025年11月19日历史提交

license: cc-by-4.0 task_categories:

robotics tags:
unitree-g1
pick-and-place
simulation
curobo
depth-perception
rgbd size_categories:
100K<n<1M language:
en pretty_name: Unitree G1 Apple Pick and Place with Depth Dataset

Unitree G1 Apple Pick and Place with Depth Dataset

Front View (Global)	Side View (Profile)
Top-Down View (Bird's Eye)	Ego-Centric View (Robot POV)
Multi-view perspectives of the Unitree G1 performing the pick-and-place task.

Depth Data Visualization

To aid in understanding the raw depth values, here we provide a side-by-side comparison of a normalized depth image (for visual clarity) and its corresponding RGB frame.

Normalized Depth Image (for visualization)	Corresponding RGB Image (from `rs_view` camera)
A sample depth frame (normalized to 0-255 for grayscale visualization) and its synchronized RGB counterpart.

## Dataset Description

The Unitree G1 Apple Pick and Place with Depth Dataset contains 963 high-quality trajectories with per-frame depth images for RGB-D manipulation research. The robot picks up a red apple and places it into a bowl using bilateral arms and tri-finger hands. All trajectories include synchronized depth measurements from a head-mounted camera.

Key Features:

963 successful trajectories with depth data
256×256 depth images per timestep (277,592 total frames)
28-DOF control: bilateral arms (7+7) + dexterous hands (7+7)
256×256 RGB video at 20 FPS (ego view)
CuRobo motion planning (collision-free trajectories)
MuJoCo + RoboCasa simulation with realistic depth rendering

This dataset extends the base dataset with depth perception for RGB-D manipulation and 3D scene understanding research.

Note on Data Availability: To maintain accessibility within storage limits, the depth data hosted here is a subset containing 10 sample episodes. This allows users to verify the data structure and quality. The full dataset containing depth maps for all 963 trajectories is archived separately. If you need the complete dataset for training large-scale models, please refer to the Contact section below.

Dataset Owner

Junsung Park (@jnsungp)

License

Creative Commons Attribution 4.0 International (CC BY 4.0)

Dataset Format

Modality	Type	Shape	Description
Observation State	`float32`	`(28,)`	Joint positions (radians) for arms + hands
Observation Depth	`float32`	`(256, 256)`	Depth image (meters) from rs_view camera
Action	`float32`	`(28,)`	Target joint positions
Video	RGB	`(256, 256, 3)`	Ego view, 20 FPS, H.264
Language	`string`	-	"Pick up the red apple and place it on the bowl"

Depth Image Specification

Depth measurements captured from the robot's head-mounted camera:

Property	Value	Description
Resolution	256 × 256	Width × Height in pixels
Data Type	`float32`	32-bit floating point
Units	Meters (m)	Distance from camera to surface
Camera	`rs_view`	Head-mounted RGB-D camera
Format	`.npy`	NumPy binary format
Range	~0.3m to 5.0m	Typical depth range in scene

Loading Depth Data:

import numpy as np

# Load single depth frame
depth = np.load("depth/chunk-000/episode_000000/frame_000050.npy")
print(depth.shape)  # (256, 256)
print(f"Min depth: {depth.min():.2f}m, Max depth: {depth.max():.2f}m")

Path Template:

depth/chunk-{episode_chunk:03d}/episode_{episode_index:06d}/frame_{frame_index:06d}.npy

Joint Configuration (28-DOF)

Body Part	DOF	Description
Left Arm	7	Shoulder (3) + Elbow (1) + Wrist (3)
Right Arm	7	Shoulder (3) + Elbow (1) + Wrist (3)
Left Hand	7	Index (2) + Middle (2) + Thumb (3)
Right Hand	7	Index (2) + Middle (2) + Thumb (3)

Dataset Statistics

Trajectories: 963
Total Frames: 277,592
Avg Episode Length: ~288 frames (~14.4 seconds)
Episode Length Range: 180-400 frames
Storage Size: ~2.5 GB (data + videos + depth)
Success Rate: 100%

Download

huggingface-cli download \
    --repo-type dataset jnsungp/unitree-g1-robocasa-pick-apple-bowl-depth-1k \
    --local-dir ./datasets/g1-depth

Using Python

from datasets import load_dataset

dataset = load_dataset("jnsungp/unitree-g1-robocasa-pick-apple-bowl-depth-1k")

Dataset Structure

dataset_depth_1k/
├── data/
│   └── chunk-000/
│       ├── episode_000000.parquet
│       ├── episode_000001.parquet
│       └── ...
├── videos/
│   └── chunk-000/
│       └── observation.images.ego_view/
│           ├── episode_000000.mp4
│           ├── episode_000001.mp4
│           └── ...
├── depth/
│   └── chunk-000/
│       ├── episode_000000/
│       │   ├── frame_000000.npy
│       │   ├── frame_000001.npy
│       │   └── ...
│       ├── episode_000001/
│       └── ...
├── meta/
│   ├── info.json          # Dataset metadata
│   ├── stats.json         # Statistics (mean, std, min, max)
│   ├── tasks.jsonl        # Task descriptions
│   └── episodes.jsonl     # Episode information
└── README.md

Loading Data Example

import pandas as pd
import numpy as np
import cv2
import matplotlib.pyplot as plt

# Load trajectory data
df = pd.read_parquet("data/chunk-000/episode_000000.parquet")

# Access data
observations = df['observation.state'].values  # (N, 28) - joint positions
actions = df['action'].values                  # (N, 28) - target positions

# Load RGB video
cap = cv2.VideoCapture("videos/chunk-000/observation.images.ego_view/episode_000000.mp4")

# Load depth images
episode_idx = 0
frame_idx = 100

depth = np.load(f"depth/chunk-000/episode_{episode_idx:06d}/frame_{frame_idx:06d}.npy")

# Visualize depth
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(depth, cmap='turbo')
plt.colorbar(label='Depth (m)')
plt.title('Depth Image')

plt.subplot(1, 2, 2)
# Read corresponding RGB frame
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
ret, rgb = cap.read()
plt.imshow(cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB))
plt.title('RGB Image')
plt.show()

cap.release()

Use Cases

1. RGB-D Manipulation

Train policies that leverage depth information for:

Precise 3D localization of objects
Distance-aware grasping
Occlusion-robust perception

2. 3D Scene Understanding

Point cloud generation from RGB-D pairs
3D object detection and segmentation
Spatial reasoning for manipulation

3. Depth-Aware Policy Learning

Multi-modal learning (RGB + Depth)
Improved generalization with geometric cues
Robustness to lighting variations

4. Sim-to-Real Transfer

Fine-tune models with realistic depth sensing
Domain adaptation with geometric constraints
Depth-based safety checks

Technical Details

Simulation:

Platform: MuJoCo + RoboCasa
Robot: Unitree G1 (upper body)
Hands: Dex31 tri-finger hands
Depth Rendering: MuJoCo native depth rendering

Motion Planning:

CuRobo (GPU-accelerated)
Collision-free trajectories
Smooth cubic interpolation

Depth Sensing:

Camera: Head-mounted RGB-D sensor (rs_view)
Resolution: 256×256 pixels
Format: 32-bit float, meters
Per-frame depth synchronized with RGB

Comparison with Base Dataset

Feature	Base Dataset	Depth Dataset
Trajectories	957	963
Joint State	✓ (28D)	✓ (28D)
RGB Video	✓	✓
Depth Images	✗	✓ (256×256)
Use Case	Vision-based manipulation	RGB-D 3D manipulation

Citation

@dataset{park2025unitree_g1_depth,
  title={Unitree G1 Apple Pick and Place with Depth Dataset},
  author={Park, Junsung},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/datasets/jnsungp/unitree-g1-robocasa-pick-apple-bowl-depth-1k}
}

Acknowledgments

Built with CuRobo, RoboCasa, MuJoCo, and Unitree G1.

Version: 1.0 | Last Updated: November 19, 2025

Contact & Full Dataset Access

For questions, issues, or to request the full depth dataset (963 episodes):

Email: night1115@snu.ac.kr
Hugging Face: @jnsungp
Institution: Seoul National University

Please include your affiliation when requesting the full dataset.