文件最后提交记录最后更新时间
!6851 [Clean Code] 修改芯片型号描述 Merge pull request !6851 from 施康/master 1 年前
init 4 年前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
init 4 年前
init 4 年前
init 4 年前
init 4 年前
init 4 年前
README.md

3DUnet Onnx模型端到端推理指导

1 模型概述

1.1 论文地址

3DUnet论文

1.2 代码地址

3DUnet代码

2 环境说明

2.1 深度学习框架

CANN 5.0.4

pytorch >= 1.4.0
torchvision >= 0.6.0
onnx >= 1.7.0

2.2 python第三方库

numpy == 1.18.5
Pillow == 7.2.0
opencv-python == 4.5.1.48

说明:

X86架构:pytorch,torchvision和onnx可以通过官方下载whl包安装,其它可以通过pip3.7 install 包名 安装

Arm架构:pytorch,torchvision和onnx可以通过源码编译安装,其它可以通过pip3.7 install 包名 安装

3 模型转换

3.1 pth转onnx模型

1.下载pth权重文件
3DUnet预训练pth权重文件

wget https://ascend-pytorch-model-file.obs.cn-north-4.myhuaweicloud.com/%E9%AA%8C%E6%94%B6-%E6%8E%A8%E7%90%86/cv/segmentation/3DUnet/UNET3D.pth

2.安装3DUnet模型代码

git clone https://github.com/black0017/MedicalZooPytorch.git

3.编写pth2onnx脚本UNet3D_pth2onnx.py

说明:

注意目前ATC支持的onnx算子版本为11

4.执行pth2onnx脚本,生成onnx模型文件

python3 UNet3D_pth2onnx.py  --input UNET3D.pth --output unet_1.onnx

5、使用sim简化转换的onnx文件 python3.7 -m onnxsim --input-shape="1,4,64,64,64" unet_1.onnx unet_1_sim.onnx

6、对生成的onnx进行改图脚本修改(因为resize算子不支持5HD的输入)生成act输入的onnx(第一个参数是输入的简化后的onnx文件,第二个参数是输出的改图之后的onnx文件, 第三个参数是要转的onnx对应的batchsize大小)

python3 modify.py unet_1_sim.onnx unet_1_final.onnx 1

模型转换要点:

此模型转换为onnx不需要修改开源代码仓代码,故不需要特殊说明

3.2 onnx转om模型

1.设置环境变量

source /home/zl/Ascend503/ascend-toolkit/set_env.sh

2.使用atc将onnx模型转换为om模型文件,工具使用方法可以参考CANN V100R020C10 开发辅助工具指南 (推理) 01

atc --framework=5 --model=./unet_1_final.onnx --output=UNet3D_bs1 --input_format=NCDHW --input_shape="image:1,4,64,64,64" --log=error --soc_version=Ascend310

4 数据集预处理

4.1 数据集获取

该模型使用[Brats2018数据集]验证集进行测试,图片与标签分别存放在 https://ascend-pytorch-one-datasets.obs.cn-north-4.myhuaweicloud.com/train/zip/MICCAI_BraTS_2018_Data_Training.zip

4.2 数据集预处理

1.预处理脚本preprocess.py

2.执行预处理脚本,生成数据集预处理后的bin文件(两个目录都要存在)

python3.7 preprocess.py --pretrained ./UNET3D.pth --output_bin ./syf_outBin1 --output_label ./syf_targetBin1/ --batchSz 1

5 离线推理

5.1 msame工具概述

因为benchmark工具不支持3D的输入,所以采用msame的方式进行离线推理。 msame工具为华为自研的模型推理工具,支持多种模型的离线推理,获取工具及使用方法可以参考(https://gitee.com/ascend/tools/tree/master/msame)

5.2 离线推理

2.执行离线推理

./main --model "./UNet3D_bs1.om" --input ./syf_outBin1 --output ./syfmsamebinout1 --outfmt BIN

main是msame工具生成的可执行文件。第一个输入是经过预处理脚本处理之后生成的bin文件。output参数是经过msame处理之后输出的bin文件目录(需要注意,输出目录必须是事先存在的,否则会报错)

6 精度对比

6.1 离线推理TopN精度统计

后处理统计DSC score分数

调用postprocess.py脚本推理结果与label比对,可以获得DSC score分数

python3.7 postprocess.py --pretrained ./UNET3D.pth --input_bin ./syfmsamebinout1/20211217_11_36_6_921476/ --input_label ./syf_targetBin1/ --batchSz 1

查看输出结果:

--------score.avg------------ 0.25602152897045016

经过对bs1与bs16的om测试,本模型batch1的精度与batch16的精度没有差别,精度数据均如上

6.2 开源TopN精度

3DUnet代码仓精度

Model        score     
3DUnet       0.2543128919787705	   

6.3 精度对比

将得到的om离线模型推理score精度比直接跑源码仓的推理脚本的精度要高,故精度达标。
精度调试:

没有遇到精度不达标的问题,故不需要进行精度调试

7 性能对比

7.1 npu性能数据

1.msame工具在整个数据集上推理获得性能数据
./main --model "./UNet3D_bs1.om" --input ./syf_outBin1 --output ./syfmsamebinout1 --outfmt BIN batch1的性能


Inference average time : 92.93 ms
Inference average time without first time: 92.92 ms
[INFO] unload model success, model Id is 1
[INFO] Execute sample success
[INFO] end to destroy stream
[INFO] end to destroy context
[INFO] end to reset device is 0
[INFO] end to finalize acl



fps = 1 / 0.09293 = 10.75615790039798

batch16的性能 ./main --model "./UNet3D_bs16.om" --input ./syf_outBin16 --output ./syfmsamebinout16 --outfmt BIN

Inference average time : 93.14 ms
Inference average time without first time: 93.12 ms
[INFO] unload model success, model Id is 1
[INFO] Execute sample success
[INFO] end to destroy stream
[INFO] end to destroy context
[INFO] end to reset device is 0
[INFO] end to finalize acl

fps = 1 / 0.09314 = 10.73652566029633

batch4性能: ./main --model "./UNet3D_bs4.om" --input ./syf_outBin4 --output ./syfmsamebinout4 --outfmt BIN

Inference average time : 252.54 ms
Inference average time without first time: 252.53 ms
[INFO] unload model success, model Id is 1
[INFO] Execute sample success
[INFO] end to destroy stream
[INFO] end to destroy context
[INFO] end to reset device is 0
[INFO] end to finalize acl


fps = 1 / 0.25254 = 3.959768749505029

batch8性能: ./main --model "./UNet3D_bs8.om" --input ./syf_outBin8 --output ./syfmsamebinout8 --outfmt BIN


Inference average time : 468.58 ms
Inference average time without first time: 468.56 ms
[INFO] unload model success, model Id is 1
[INFO] Execute sample success
[INFO] end to destroy stream
[INFO] end to destroy context
[INFO] end to reset device is 0
[INFO] end to finalize acl

fps = 1 / 0.46858 = 2.134107302915191

batch32性能: ./main --model "./UNet3D_bs32.om" --input ./syf_outBin32 --output ./syfmsamebinout32 --outfmt BIN

Inference average time : 1835.80 ms
Inference average time without first time: 1836.05 ms
[INFO] unload model success, model Id is 1
[INFO] Execute sample success
[INFO] end to destroy stream
[INFO] end to destroy context
[INFO] end to reset device is 0
[INFO] end to finalize acl

fps = 1 / 1.83580 = 0.5447216472382612

7.2 T4性能数据

跑的是源码推理不涉及batchsize,在装有T4卡的服务器上测试gpu性能,测试过程请确保卡没有运行其他任务,TensorRT版本:7.2.3.4,cuda版本:11.0,cudnn版本:8.2

python3 inference.py(训练生成的pth文件对应的batchsize是4)

gpu T4是4个device并行执行的结果,

----------score.avg----------- 0.2543128919787705
endtime 0.20030641555786133
fps 19.96940531764717