6e6c2a40创建于 2月4日历史提交

文件	最后提交记录	最后更新时间
LICENSE	combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper，需要整合到一起并表明模型类型版本 ## Modification 将whisper base，whisper large v3 （turbo），whisper base en （om推理）合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7482	3 个月前
README.md	combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper，需要整合到一起并表明模型类型版本 ## Modification 将whisper base，whisper large v3 （turbo），whisper base en （om推理）合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7482	3 个月前
atc.sh	combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper，需要整合到一起并表明模型类型版本 ## Modification 将whisper base，whisper large v3 （turbo），whisper base en （om推理）合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7482	3 个月前
modelzoo_level.txt	combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper，需要整合到一起并表明模型类型版本 ## Modification 将whisper base，whisper large v3 （turbo），whisper base en （om推理）合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7482	3 个月前
om_val.py	combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper，需要整合到一起并表明模型类型版本 ## Modification 将whisper base，whisper large v3 （turbo），whisper base en （om推理）合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7482	3 个月前
patch_apply.py	combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper，需要整合到一起并表明模型类型版本 ## Modification 将whisper base，whisper large v3 （turbo），whisper base en （om推理）合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7482	3 个月前
pth2onnx.py	combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper，需要整合到一起并表明模型类型版本 ## Modification 将whisper base，whisper large v3 （turbo），whisper base en （om推理）合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7482	3 个月前
requirements.txt	combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper，需要整合到一起并表明模型类型版本 ## Modification 将whisper base，whisper large v3 （turbo），whisper base en （om推理）合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7482	3 个月前
whisper_model.patch	combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper，需要整合到一起并表明模型类型版本 ## Modification 将whisper base，whisper large v3 （turbo），whisper base en （om推理）合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7482	3 个月前

whisper-base.en-om推理指导

[概述]
[推理环境准备]
[快速上手]
- [获取源码]
- [数据集准备]
- [模型推理]
[模型推理性能&精度]

概述

Whisper 是一个通用的语音识别模型。它在一个大型多样化音频数据集上进行训练。Whisper采用Transformer结构，是一个序列到系列的多任务模型，可用于各种语音处理任务，包括多语种语音识别、语音翻译、口语语言识别和语音活动检测，这些任务被表示为一个由解码器预测的token序列，使得单个模型就能完成传统多阶段的语音处理流水线，多任务训练格式使用一组特殊的token用作任务分类。

参考实现：

url=https://github.com/openai/whisper

输入输出数据

encoder输入数据

输入数据数据类型大小数据排布格式

mel FLOAT32 batchsize x 80 x mel_len ND
encoder输出数据

输出数据数据类型大小数据排布格式

n_layer_cross_k FLOAT32 batchsize x mel_len ND

n_layer_cross_v FLOAT32 6 x batchsize x mel_len x 512 ND

输入数据	数据类型	大小	数据排布格式
mel	FLOAT32	batchsize x 80 x mel_len	ND

输出数据	数据类型	大小	数据排布格式
n_layer_cross_k	FLOAT32	batchsize x mel_len	ND
n_layer_cross_v	FLOAT32	6 x batchsize x mel_len x 512	ND

decoder输入数据

输入数据	数据类型	大小	数据排布格式
tokens	INT64	batchsize x n_tokens	ND
in_n_layer_self_k_cache	FLOAT32	6 x batchsize x 448 x 512	ND
in_n_layer_self_v_cache	FLOAT32	6 x batchsize x 448 x 512	ND
n_layer_cross_k	FLOAT32	6 x batchsize x mel_len/2 x 512	ND
n_layer_cross_v	FLOAT32	6 x batchsize x mel_len/2 x 512	ND
offset	INT64	1	ND

decoder输出数据

输出数据	数据类型	大小	数据排布格式
logits	FLOAT32	batchsize x mel_len	ND
out_n_layer_self_k_cache	FLOAT32	6 x batchsize x mel_len x 512	ND
out_n_layer_self_v_cache	FLOAT32	6 x batchsize x mel_len x 512	ND

推理环境准备

该模型需要以下插件与驱动

表 1 版本配套表

配套	版本	环境准备指导
固件与驱动	24.1.rc3	Pytorch框架推理环境准备
CANN	8.3.RC1	-
Python	3.11.10	-
PyTorch	2.5.1	-
说明：Atlas 300I Duo 推理卡请以CANN版本选择实际固件与驱动版本。	\	\

快速上手

获取源码

获取本仓库OM推理代码

   git clone https://gitcode.com/ascend/ModelZoo-PyTorch.git
   cd ModelZoo-PyTorch/ACL_PyTorch/built-in/audio/whisper/whisper_om

安装依赖

pip install -r requirements.txt

安装ffmpeg

sudo apt-get install ffmpeg

数据准备

本模型使用一段音频文件作为输入，[测试数据地址](链接: https://pan.baidu.com/s/1xiHW7tmJe3lfAdQABWqsFA?pwd=gya6 提取码: gya6)如下，下载音频文件并存放在项目data目录下：

Whisper
    ├── data      
    	└── test.wav

模型推理

1 模型转换

将模型权重文件.pth转换为.onnx文件，再使用ATC工具将.onnx文件转为离线推理模型.om文件。

获取权重文件

下载权重文件，以base.en为例，将下载的模型文件base.en.pt放在项目目录下。

导出ONNX模型先给whisper打上patch：

python3 patch_apply.py

运行pth2onnx.py导出ONNX模型。

python3 pth2onnx.py --model base.en.pt --output_dir whisper_base_en

原始模型中的配置信息与对应的tokenizer分别保存至model_cfg.josn和tokens.txt 方便后续om模型推理时能读取对应的信息。由于whisper模型由encoder和decoder组成，且encoder和decoder需要进行Cross Attention操作，所以需要对模型进行修改，从而该脚本将会导出两个ONNX 模型，即encoder.onnx和decoder.onnx。

执行完这一步项目目录如下：

📁 whisper_om
    ├── pth2onnx.py
    ├── atc.sh
    ├── om_val.py
    ├── patch_apply.py
    ├── whisper_model.patch
    ├── 📁 whisper_base_en
    |   ├── encoder.onnx
    |   ├── decoder.onnx
    |   ├── model_cfg.json
    |   ├── tokens.txt
    ├── requirements.txt
    ├── modelzoo_level.txt
    ├── README.md
    └── LICENSE

使用ATC工具将ONNX模型转为OM模型。

配置环境变量
```
source /usr/local/Ascend/ascend-toolkit/set_env.sh
```
说明：
该脚本中环境变量仅供参考，请以实际安装环境配置环境变量。详细介绍请参见《CANN 开发辅助工具指南 (推理)》。

执行命令查看芯片名称（得到atc命令参数中soc_version）

npu-smi info
#该设备芯片名为Ascend310P3 （自行替换）
回显如下：
+-------------------+-----------------+------------------------------------------------------+
| NPU     Name      | Health          | Power(W)     Temp(C)           Hugepages-Usage(page) |
| Chip    Device    | Bus-Id          | AICore(%)    Memory-Usage(MB)                        |
+===================+=================+======================================================+
| 0       310P3     | OK              | 15.8         42                0    / 0              |
| 0       0         | 0000:82:00.0    | 0            1074 / 21534                            |
+===================+=================+======================================================+
| 1       310P3     | OK              | 15.4         43                0    / 0              |
| 0       1         | 0000:89:00.0    | 0            1070 / 21534                            |
+===================+=================+======================================================+

执行ATC命令
运行atc.sh导出OM模型。

#运行脚本示例如下，根据实际环境更改相应的参数，注意onnx文件路径参数不需要".onnx"后缀
bash atc.sh --encoder_model=path/to/encoder --decoder_model=path/to/decoder --bs=1 --config=path/to/model_cfg.json --output_dir=output --soc=Ascend310P3

#encoder模型实际执行的atc转换命令
atc --framework=5 --input_format=ND --log=error --soc_version=${soc}
    --model=${encoder_model}.onnx --output=${output_dir}/${encoder_model}_bs${bs} 
    --input_shape="mel:${bs},${n_mels},250~3000"

#decoder模型实际执行的atc转换命令
atc --framework=5 --input_format=ND --log=error --soc_version=${soc} \
    --model=${decoder_model}.onnx --output=${output_dir}/${decoder_model}_bs${bs} \
    --input_shape="tokens:${bs},1~4;in_n_layer_self_k_cache:${n_text_layer},${bs},${n_text_ctx},${n_text_state}; \
    in_n_layer_self_v_cache:${n_text_layer},${bs},${n_text_ctx},${n_text_state};n_layer_cross_k:${n_text_layer},${bs},250~3000,${n_text_state};n_layer_cross_v:${n_text_layer},${bs},250~3000,${n_text_state};offset:1"

由于音频数据的长度不固定，再转OM模型时使用~来接受范围内的输入。除此之外，用户可以padding频谱长度，将动态值固定在一些档位，转而使用--dynamic_dims参数来接受动态输入。

参数说明：
- --model：ONNX模型文件
- --framework：5代表ONNX模型
- --output：输出的OM模型
- --input_shape：输入数据的shape
- --log：日志级别
- --soc_version：处理器型号

运行成功后会生成encoder_bs${bs}_{os}_{arch}.om与decoder_bs${bs}_{os}_{arch}.om模型文件

2 开始推理验证

安装ais_bench推理工具请访问ais_bench，根据readme文件进行工具安装，建议使用whl包进行安装。
执行推理运行om_val.py推理OM模型，默认为转录任务，模型将输入的语音文件转化为对应的文字。若使用的为多语言模型则可进行翻译任务，将其他语种的语音文件，翻译为英文的语音文件。
```
python3 om_val.py --encoder encoder_linux_{arch}.om \
                  --decoder decoder_linux_{arch}.om \
                  --tokens tokens.txt \
                  --model-cfg model_cfg.json \
                  --sound_file data/test.wav
```

性能验证
可使用ais_bench推理工具的纯推理模式验证OM模型的性能，参考命令如下：

python3 -m ais_bench --model encoder_bs${bs}_linux_{arch}.om --loop 10 --dymShape "mel:${bs},80,3000" --outputSize "10000000,10000000"

python3 -m ais_bench --model decoder_bs${bs}_linux_{arch}.om --loop 10 --dymShape "tokens:${bs},1;in_n_layer_self_k_cache:6,${bs},448,512;in_n_layer_self_v_cache:6,${bs},448,512;n_layer_cross_k:6,${bs},351,512;n_layer_cross_v:6,${bs},351,512;offset:1" --outputSize "100000000,100000000,10000000"

模型推理性能&精度

NPU芯片型号	Batch Size	mel_len	数据集	throughout性能(encoder/decoder)
300I Pro	1	2000	随机数据	90.96/45.70
300I Pro	1	1000	随机数据	204.26/50.12
300I Pro	2	2000	随机数据	93.09/77.79
300I Pro	2	1000	随机数据	252.22/83.18