结合任意视图Giant模型与度量Large模型，实现相对深度、位姿估计、3D高斯等多能力视觉几何重建，性能超越Depth Anything 2和VGGT。【此简介由AI生成】

SsystemUpload folder using huggingface_hub

b2359bdf创建于 2025年12月11日2次提交

文件	最后提交记录	最后更新时间
.gitattributes	initial commit	5 个月前
README.md	Upload folder using huggingface_hub	5 个月前
config.json	Upload folder using huggingface_hub	5 个月前
model.safetensorsLFS	Upload folder using huggingface_hub	5 个月前

自动翻译

license: cc-by-nc-4.0 tags:

depth-estimation
computer-vision
monocular-depth
multi-view-geometry
pose-estimation library_name: depth-anything-3 pipeline_tag: depth-estimation

Depth Anything 3: DA3NESTED-GIANT-LARGE

# noqa: E501

模型描述

DA3 Nested 模型将任意视角 Giant 模型与 metric Large 模型相结合，用于 metric 尺度视觉几何重建。这是我们推荐的综合所有功能的模型。

属性	值
模型系列	Nested
参数数量	1.40B
许可证	CC BY-NC 4.0

⚠️ 仅限非商业用途（基于 CC BY-NC 4.0 许可证）。

功能

✅ 相对深度
✅ 姿态估计
✅ 姿态条件控制
✅ 3D 高斯
✅ 度量深度
✅ 天空分割

快速开始

安装

git clone https://github.com/ByteDance-Seed/depth-anything-3
cd depth-anything-3
pip install -e .

基本示例

import torch
from depth_anything_3.api import DepthAnything3

# Load model from Hugging Face Hub
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = DepthAnything3.from_pretrained("depth-anything/da3nested-giant-large")
model = model.to(device=device)

# Run inference on images
images = ["image1.jpg", "image2.jpg"]  # List of image paths, PIL Images, or numpy arrays
prediction = model.inference(
    images,
    export_dir="output",
    export_format="glb"  # Options: glb, npz, ply, mini_npz, gs_ply, gs_video
)

# Access results
print(prediction.depth.shape)        # Depth maps: [N, H, W] float32
print(prediction.conf.shape)         # Confidence maps: [N, H, W] float32
print(prediction.extrinsics.shape)   # Camera poses (w2c): [N, 3, 4] float32
print(prediction.intrinsics.shape)   # Camera intrinsics: [N, 3, 3] float32

命令行界面

# Process images with auto mode
da3 auto path/to/images \
    --export-format glb \
    --export-dir output \
    --model-dir depth-anything/da3nested-giant-large

# Use backend for faster repeated inference
da3 backend --model-dir depth-anything/da3nested-giant-large
da3 auto path/to/images --export-format glb --use-backend

模型详情

开发团队： ByteDance Seed Team
模型类型： 用于视觉几何的视觉Transformer
架构： 采用统一深度射线表示的纯Transformer
训练数据： 仅使用公共学术数据集

核心见解

💎 单个纯Transformer（例如基础DINO编码器）即可作为骨干网络，无需架构上的专门设计。 # noqa: E501

✨ 单一的深度射线表示消除了对复杂多任务学习的需求。

性能表现

🏆 Depth Anything 3 在以下方面显著优于同类模型：

Depth Anything 2（单目深度估计）
VGGT（多视图深度估计和姿态估计）

有关详细的基准测试结果，请参阅我们的论文。 # noqa: E501

局限性

模型基于学术数据集训练，在某些特定领域图像上可能存在局限性 # noqa: E501
性能可能因图像质量、光照条件和场景复杂度而异
⚠️ 根据CC BY-NC 4.0许可协议，仅限非商业用途。

引用

如果您在研究或项目中发现Depth Anything 3有用，请引用：

@article{depthanything3,
  title={Depth Anything 3: Recovering the visual space from any views},
  author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang},  # noqa: E501
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2025}
}