ascend-robotfix:fix code/documentation formatting issues using pre-commit

文件	最后提交记录	最后更新时间
test	fix:fix code/documentation formatting issues using pre-commit Co-authored-by: xiaohuizhu1037<zhuxiaohui13@huawei.com> # message auto-generated for no-merge-commit merge: !2043 merge master into master fix:fix code/documentation formatting issues using pre-commit Created-by: xiaohuizhu1037 Commit-by: xiaohuizhu1037 Merged-by: ascend-robot Description: ## What this PR does / why we need it? 使用pre-commit扫描仓库的全量代码/文档并进行格式修复。 ## Does this PR introduce any user-facing change? 会对代码/文档做格式修改。 ## How was this patch tested? 工具自动修复和人工审视。 See merge request: Ascend/DrivingSDK!2043	6 天前
README.md	fix:fix code/documentation formatting issues using pre-commit Co-authored-by: xiaohuizhu1037<zhuxiaohui13@huawei.com> # message auto-generated for no-merge-commit merge: !2043 merge master into master fix:fix code/documentation formatting issues using pre-commit Created-by: xiaohuizhu1037 Commit-by: xiaohuizhu1037 Merged-by: ascend-robot Description: ## What this PR does / why we need it? 使用pre-commit扫描仓库的全量代码/文档并进行格式修复。 ## Does this PR introduce any user-facing change? 会对代码/文档做格式修改。 ## How was this patch tested? 工具自动修复和人工审视。 See merge request: Ascend/DrivingSDK!2043	6 天前
UniAD.patch	!1581 optimize UniAD stage 1 track loss Merge pull request !1581 from zhangchenrui/master	8 个月前
mmcv.patch	!1152 UniAD多进程修复 Merge pull request !1152 from wangyixuan30/master	1 年前
mmdet3d.patch	!986 Add UniAD Merge pull request !986 from wangyixuan30/master	1 年前

UniAD for PyTorch

简介

模型介绍

自动驾驶是一项高度复杂的技术，需要多个学科领域的知识和技能，包括传感器技术、机器学习、路径规划等方面。自动驾驶还需要适应不同的道路规则和交通文化，与其他车辆和行人进行良好的交互，以实现高度可靠和安全的自动驾驶系统。面对这种复杂的场景，大部分自动驾驶相关的工作都聚焦在具体的某个模块，关于框架性的研讨则相对匮乏。自动驾驶通用算法框架——Unified Autonomous Driving（UniAD）首次将检测、跟踪、建图、轨迹预测，占据栅格预测以及规划整合到一个基于Transformer的端到端网络框架下，完美契合了”多任务”和“高性能”的特点，是自动驾驶中的重大技术突破。

代码实现

参考实现：

url=https://github.com/OpenDriveLab/UniAD
commit_id=7b5bf15e0e49522b6553ddc48e67833e8f5f0f52

适配昇腾 AI 处理器的实现：

url=https://gitcode.com/Ascend/DrivingSDK.git
code_path=model_examples/UniAD

UniAD

准备训练环境

安装环境

表 1 三方库版本支持表

三方库	支持版本
PyTorch	2.1.0

安装昇腾环境

请参考昇腾社区中《Pytorch框架训练环境准备》文档搭建昇腾环境，本仓已支持表2中软件版本。

表 2 昇腾软件版本支持表

软件类型	首次支持版本
FrameworkPTAdapter	7.0.0
CANN	8.1.RC1

激活 CANN 环境
安装Driving SDK加速库，安装方法参考原仓。

安装mmcv

git clone -b 1.x https://github.com/open-mmlab/mmcv.git
cd mmcv
cp -f ../mmcv.patch ./
git apply --reject --whitespace=fix mmcv.patch
pip install -r requirements/runtime.txt
MMCV_WITH_OPS=1 FORCE_NPU=1 python setup.py install

源码安装mmdet3d

git clone -b v1.0.0rc6 https://github.com/open-mmlab/mmdetection3d.git
cp -f ../mmdet3d.patch mmdetection3d
cd mmdetection3d
git apply --reject --whitespace=fix mmdet3d.patch
pip install -r requirements/runtime.txt
pip install -e .

准备模型源码

git clone https://github.com/OpenDriveLab/UniAD.git
cp -f UniAD.patch UniAD
cp -r test UniAD
cd UniAD
git checkout 7b5bf15e0e49522b6553ddc48e67833e8f5f0f52
git apply --reject --whitespace=fix UniAD.patch
pip install -r requirements.txt

根据操作系统，安装tcmalloc动态库。

OpenEuler系统

在当前python环境和路径下执行以下命令，安装并使用tcmalloc动态库。

mkdir gperftools
cd gperftools
wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.16/gperftools-2.16.tar.gz
tar -zvxf gperftools-2.16.tar.gz
cd gperftools-2.16
./configure --prefix=/usr/local/lib --with-tcmalloc-pagesize=64
make
make install
echo '/usr/local/lib/lib/' >> /etc/ld.so.conf
ldconfig
export LD_LIBRARY_PATH=/usr/local/lib/lib/:$LD_LIBRARY_PATH
export PATH=/usr/local/lib/bin:$PATH
export LD_PRELOAD=/usr/local/lib/lib/libtcmalloc.so.4

Ubuntu系统

在当前python环境和路径下执行以下命令，安装并使用tcmalloc动态库。在安装tcmalloc前，需确保环境中含有autoconf和libtool依赖包。

安装libunwind依赖：

git clone https://github.com/libunwind/libunwind.git
cd libunwind
autoreconf -i
./configure --prefix=/usr/local
make -j128
make install

安装tcmalloc动态库：

wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.16/gperftools-2.16.tar.gz
tar -xf gperftools-2.16.tar.gz && cd gperftools-2.16
./configure --prefix=/usr/local/lib --with-tcmalloc-pagesize=64
make -j128
make install
export LD_PRELOAD="$LD_PRELOAD:/usr/local/lib/lib/libtcmalloc.so"

准备数据集

根据原仓Prepare Dataset章节准备数据集，数据集目录及结构如下：

UniAD
├── data/
│   ├── nuscenes/
│   │   ├── can_bus/
│   │   ├── maps/
│   │   ├── samples/
│   │   ├── sweeps/
│   │   ├── v1.0-test/
│   │   ├── v1.0-trainval/
│   ├── infos/
│   │   ├── nuscenes_infos_temporal_train.pkl
│   │   ├── nuscenes_infos_temporal_val.pkl
│   ├── others/
│   │   ├── motion_anchor_infos_mode6.pkl

说明： 该数据集的训练过程脚本只作为一种参考示例。

准备预训练权重

根据原仓Installation章节下载预训练权重bevformer_r101_dcn_24ep.pth和uniad_base_track_map.pth，并放在模型根目录ckpts下：

UniAD
├── ckpts/
│   ├── bevformer_r101_dcn_24ep.pth
│   ├── uniad_base_track_map.pth

（可选）可通过修改config文件中的load_from值来更改预训练权重

# projects/configs/stage1_track_map.py
load_from = "ckpts/bevformer_r101_dcn_24ep.pth"

快速开始

训练任务

本任务主要提供单机的8卡训练脚本。

开始训练

在模型根目录下，运行训练脚本。

该模型支持单机8卡训练。

单机8卡性能训练

bash test/train_stage1_performance_8p.sh # stage1
bash test/train_stage2_performance_8p.sh # stage2

单机8卡精度训练

bash test/train_stage1_full_8p.sh # stage1
bash test/train_stage2_full_8p.sh # stage2

该模型支持双机多卡训练。

# 'XX.XX.XX.XX'为主节点的IP地址；端口号可以换成未被占用的可用端口

# stage1
bash test/train_stage1_multi_server.sh 2 0 ‘xx.xx.xx.xx’ '3389' #主节点
bash test/train_stage1_multi_server.sh 2 1 ‘xx.xx.xx.xx’ '3389' #副节点

# stage2
bash test/train_stage2_multi_server.sh 2 0 ‘xx.xx.xx.xx’ '3389' #主节点
bash test/train_stage2_multi_server.sh 2 1 ‘xx.xx.xx.xx’ '3389' #副节点

训练结果

单机八卡

阶段	芯片	卡数	global batch size	Precision	性能-单步迭代耗时(ms)	FPS	amota ↑	L2 ↓
stage1	竞品A	8p	8	fp32	5883	1.359	0.380	-
stage1	Atlas 800T A2	8p	8	fp32	7979	1.002	0.376	-
stage2	竞品A	8p	8	fp32	3990	2.000	-	0.9127
stage2	Atlas 800T A2	8p	8	fp32	5145	1.554	-	0.9014

多机多卡线性度

阶段	芯片	卡数	global batch size	Precision	性能-单步迭代耗时(ms)	FPS	线性度
stage1	Atlas 800T A2*2	16p	16	fp32	8333	1.920	96.30%
stage2	Atlas 800T A2*2	16p	16	fp32	5412	2.956	95.12%

变更说明

2025.02.19: 代码上仓，stage1性能0.6倍竞品，stage2性能0.5倍竞品。

2025.02.26: stage2性能优化，消除部分free time, FA替换优化, stage2性能达到0.54倍竞品。

2025.04.25: 增加多机多卡训练脚本，增加多机多卡训练性能数据。

2025.08.04: 模型性能优化，更新性能数据。

2025.09.01: 优化模型stage 1性能，更新性能数据。

FAQ

tcmalloc的动态库文件位置可能因环境配置会有所不同，找不到文件时可以进行搜索，一般安装在/usr/lib64或者/usr/local目录下：
```
find /usr -name libtcmalloc.so*
```
找到对应路径下的动态库文件，libtcmalloc.so或者libtcmalloc.so.版本号都可以使用。
Shapely 2.0+ 不再支持直接迭代 MultiPolygon，如果遇到报错 'MultiPolygon' object is not iterable，可参考issue： https://gitcode.com/Ascend/DrivingSDK/issues/184，降级 Shapely 到 1.8.x 版本。