950bc9fb创建于 7 天前历史提交

文件	最后提交记录	最后更新时间
D-FINE_NPU.patch	D-FINE 全系列训推适配 Co-authored-by: huangshengmiao<huangshengmiao@huawei.com> # message auto-generated for no-merge-commit merge: !7573 merge Peterande/D-FINE into master D-FINE 全系列训推适配 Created-by: qq_38840678 Commit-by: huangshengmiao Merged-by: ascend-robot Description: ## Motivation D-FINE 模型训推适配代码合入，适用于A2 ## Modification 包含所有适配的代码，已经打成patch，配合readme文档， ## Self-test (Optional) 训练无问题 ![image.png](https://raw.gitcode.com/user-images/assets/8112803/0c7e4705-9eb6-4599-8b65-cb4fce760c6f/image.png 'image.png') 推理验证无问题 ![image.png](https://raw.gitcode.com/user-images/assets/8112803/51af90f5-d644-4314-ab23-2877a89ca5aa/image.png 'image.png') 竞品对照见readme ## BC-breaking (Optional) 不涉及 ## Checklist Before PR: - [√] The new code needs to comply with the Clean Code specification. - [√] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [√] CLA has been signed and all committers have signed the CLA in this PR. - [√] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7573	7 天前
README.md	D-FINE 全系列训推适配 Co-authored-by: huangshengmiao<huangshengmiao@huawei.com> # message auto-generated for no-merge-commit merge: !7573 merge Peterande/D-FINE into master D-FINE 全系列训推适配 Created-by: qq_38840678 Commit-by: huangshengmiao Merged-by: ascend-robot Description: ## Motivation D-FINE 模型训推适配代码合入，适用于A2 ## Modification 包含所有适配的代码，已经打成patch，配合readme文档， ## Self-test (Optional) 训练无问题 ![image.png](https://raw.gitcode.com/user-images/assets/8112803/0c7e4705-9eb6-4599-8b65-cb4fce760c6f/image.png 'image.png') 推理验证无问题 ![image.png](https://raw.gitcode.com/user-images/assets/8112803/51af90f5-d644-4314-ab23-2877a89ca5aa/image.png 'image.png') 竞品对照见readme ## BC-breaking (Optional) 不涉及 ## Checklist Before PR: - [√] The new code needs to comply with the Clean Code specification. - [√] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [√] CLA has been signed and all committers have signed the CLA in this PR. - [√] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7573	7 天前
om_inf.py	D-FINE 全系列训推适配 Co-authored-by: huangshengmiao<huangshengmiao@huawei.com> # message auto-generated for no-merge-commit merge: !7573 merge Peterande/D-FINE into master D-FINE 全系列训推适配 Created-by: qq_38840678 Commit-by: huangshengmiao Merged-by: ascend-robot Description: ## Motivation D-FINE 模型训推适配代码合入，适用于A2 ## Modification 包含所有适配的代码，已经打成patch，配合readme文档， ## Self-test (Optional) 训练无问题 ![image.png](https://raw.gitcode.com/user-images/assets/8112803/0c7e4705-9eb6-4599-8b65-cb4fce760c6f/image.png 'image.png') 推理验证无问题 ![image.png](https://raw.gitcode.com/user-images/assets/8112803/51af90f5-d644-4314-ab23-2877a89ca5aa/image.png 'image.png') 竞品对照见readme ## BC-breaking (Optional) 不涉及 ## Checklist Before PR: - [√] The new code needs to comply with the Clean Code specification. - [√] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [√] CLA has been signed and all committers have signed the CLA in this PR. - [√] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7573	7 天前

D-FINE(全系列)-训推指导

概述

D-FINE（Fine-grained Distribution Refinement）是由中国科学技术大学（USTC）提出的实时目标检测模型，发表于 ICLR 2025 Spotlight。是 DETR 路线的一次重要演进——它用概率分布 refine 替代了传统的坐标点回归，通过自蒸馏让浅层和深层网络协同优化，在不增加推理和训练开销的前提下，显著提升了定位精度。

本实验提供D-FINE家族从 Nano 到 X-Large 的完整模型家族，参数量覆盖 3.79M ~ 63.35M。

参考实现：

url=https://github.com/Peterande/D-FINE
commit_id=d6694750683b0c7e9f523ba6953d16f112a376ae
model_name=Peterande/D-FINE

输入输出数据

输入数据

输入数据数据类型大小数据排布格式

input RGB_FP32 batchsize x 3 x 608 x 608 NCHW

输入数据	数据类型	大小	数据排布格式
input	RGB_FP32	batchsize x 3 x 608 x 608	NCHW

输出数据

输出数据	数据类型	大小	数据排布格式
feature_map_1	FLOAT32	batchsize x 255 x 76 x 76	NCHW
feature_map_2	FLOAT32	batchsize x 255 x 38 x 38	NCHW
feature_map_3	FLOAT32	batchsize x 255 x 38 x 38	NCHW

推理环境准备

该模型需要以下插件与驱动

表 1 版本配套表

配套	版本	环境准备指导
固件与驱动	与CANN 8.5.1 配套版本	Pytorch框架推理环境准备
CANN	8.5.1	-
Python	3.11.14	-
PyTorch	2.9.0	-
说明：该模型Ascend910B、Ascend910C均支持训推，Ascend310P与Ascend310B推理也已支持，请以CANN版本选择实际固件与驱动版本。	\	\

快速上手

获取源码

获取源码。

git clone https://github.com/Peterande/D-FINE.git
cd ./D-FINE/
git reset --hard d6694750683b0c7e9f523ba6953d16f112a376ae

加载补丁(提前将patch拷贝到模型目录)与权重。

git clone https://gitcode.com/Ascend/ModelZoo-PyTorch.git
cp MODELZOO-PYTORCH/ACL_PyTorch\built-in\cv\D-FINE\D-FINE_NPU.patch {D-FINE代码所在目录}
cp MODELZOO-PYTORCH/ACL_PyTorch\built-in\cv\D-FINE\om_inf.py {D-FINE代码所在目录}/tools/inference/om_inf.py
cd {D-FINE代码所在目录}
git apply D-FINE_NPU.patch

安装依赖。

pip install -r requirements.txt
pip install onnxscript

准备数据集

获取coco数据集。

数据集采用开源coco数据集，如果有自由数据，则按照coco数据集的标注格式进行标注。

模型权重准备

下载权重文件。

下载对应权重文件，并拷贝至相应路径。

 cd {D-FINE代码所在目录}
 下载权重文件,网络比较慢的话可以直接外部下载后再传入机器
 wget https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_coco.pth
 配置训练结果目录
 mkdir output
 mkdir output/dfine_hgnetv2_s_coco
 mv ./dfine_s_coco.pth output/dfine_hgnetv2_s_coco/last.pth

模型推理

直接推理（性能较差，仅作为效果验证）。

执行tor_inf.py,并提前准备好测试图片。

 python tools/inference/torch_inf.py -c {训练配置文件，如configs/dfine/dfine_hgnetv2_s_coco.yml} -r ./output/dfine_hgnetv2_s_coco/last.pth --input {测试图像路径}

om推理（性能较好，推荐该推理方式）。

使用PyTorch将模型权重文件.pth转换为.onnx文件，再使用ATC工具将.onnx文件转为离线推理模型文件.om文件。
1. 导出onnx文件。
  1. 使用tools/deployment/export_onnx.py导出onnx文件。
    
    运行export_onnx.py脚本。
```
  python tools/deployment/export_onnx.py --check -c {训练配置文件，如configs/dfine/dfine_hgnetv2_s_coco.yml} -r ./output/dfine_hgnetv2_s_coco/last.pth
```
    说明： 运行成功后在权重所在目录下，生成同名onnx文件。
  2. 使用onnx-simplifier。
    
    按照链接下载onnx-simplifier[onnx-simplifier链接] 这一步可将onnx文件传出后，在已安装onnx-simplifier上完成后再传入，避免安装带来的各类依赖问题执行如下命令
```
python -m onnxsim --overwrite-input-shape="images:1,3,640,640" last.onnx last.onnx
python -m onnxsim --overwrite-input-shape="orig_target_sizes:1,2" last.onnx last.onnx
```
  3. 生成om文件（以Ascend910B服务器为例）。
    
    需要将onnx传回训练机器，或者与训练机器同软件配置的推理设备。
```
atc \
--model={onnx路径} \
--output=om_infer \
--framework=5 \
--input_shape="images:1,3,640,640;orig_target_sizes:1,2" \
--soc_version=Ascend910B
```
    - 参数说明：
      - --model：为ONNX模型文件。
      - --framework：5代表ONNX模型。
      - --output：输出的OM模型。
      - --input_shape：输入数据的shape。
      - --soc_version：处理器型号。 soc信息获取参考https://www.hiascend.com/document/detail/zh/canncommercial/80RC3/devaids/devtools/atc/atlasatcparam_16_0036.html
  4. 执行om推理。
```
python tools/inference/om_inf.py -m ./om_infer.om -i {测试图像路径}
```
性能验证。
1. 安装ais_bench工具。
  
  请访问ais_bench推理工具代码仓，根据readme文档进行工具安装。
2. 执行ais_bench工具。
  
  执行ais_bench工具
```
python -m ais_bench --model ./om_infer.om --output ./ais_output --outfmt BIN --loop 100
```

模型推理性能&精度

调用ACL接口推理计算，性能参考下列数据（以s规格为例，输入尺寸640 * 640）。

芯片型号	参数规格	Batch Size	性能（ms）
Ascend910C	S	1	4.1ms
Ascend910C	S	2	7.1ms
Ascend910C	S	4	10.24ms
Ascend910C	S	8	17.33ms
Ascend910C	S	16	36.1ms
Ascend910B	S	1	4.52ms
Ascend910B	S	2	6.79ms
Ascend910B	S	4	9.60ms
Ascend910B	S	8	15.96ms
Ascend910B	S	16	32.65ms
Ascend310P	S	1	59.12ms
Ascend310P	S	2	113.18ms
Ascend310P	S	4	218.33ms
Ascend310P	S	8	434.04ms
Ascend310P	S	16	866.14ms
Ascend310B	S	1	36ms

如果目标设备仅Ascend310P，可以使用git restore src/zoo/dfine/utils.py,然后重新导出onnx，获取更好的性能，可达到以下的性能水平

芯片型号	参数规格	Batch Size	性能（ms）
Ascend310P	S	1	5.98ms
Ascend310P	S	2	10.52ms
Ascend310P	S	4	21.71ms
Ascend310P	S	8	44.75ms
Ascend310P	S	16	90.93ms

采用业务自有数据，精度对比

A100精准率	A100召回率	Ascend910B精准率	Ascend910B召回率
97.3%	58.34%	96.92%	55.88%ms