文件最后提交记录最后更新时间
combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper,需要整合到一起并表明模型类型版本 ## Modification 将whisper base,whisper large v3 (turbo),whisper base en (om推理)合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!74823 个月前
combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper,需要整合到一起并表明模型类型版本 ## Modification 将whisper base,whisper large v3 (turbo),whisper base en (om推理)合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!74823 个月前
combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper,需要整合到一起并表明模型类型版本 ## Modification 将whisper base,whisper large v3 (turbo),whisper base en (om推理)合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!74823 个月前
combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper,需要整合到一起并表明模型类型版本 ## Modification 将whisper base,whisper large v3 (turbo),whisper base en (om推理)合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!74823 个月前
combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper,需要整合到一起并表明模型类型版本 ## Modification 将whisper base,whisper large v3 (turbo),whisper base en (om推理)合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!74823 个月前
combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper,需要整合到一起并表明模型类型版本 ## Modification 将whisper base,whisper large v3 (turbo),whisper base en (om推理)合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!74823 个月前
combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper,需要整合到一起并表明模型类型版本 ## Modification 将whisper base,whisper large v3 (turbo),whisper base en (om推理)合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!74823 个月前
combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper,需要整合到一起并表明模型类型版本 ## Modification 将whisper base,whisper large v3 (turbo),whisper base en (om推理)合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!74823 个月前
combine whisper Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7482 merge combine_whisper into master combine whisper Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation 仓库中当前有很多版本的whisper,需要整合到一起并表明模型类型版本 ## Modification 将whisper base,whisper large v3 (turbo),whisper base en (om推理)合并到built-in下 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!74823 个月前
README.md

whisper-base.en-om推理指导

  • [概述]
  • [推理环境准备]
  • [快速上手]
    • [获取源码]
    • [数据集准备]
    • [模型推理]
  • [模型推理性能&精度]

概述

Whisper 是一个通用的语音识别模型。它在一个大型多样化音频数据集上进行训练。Whisper采用Transformer结构,是一个序列到系列的多任务模型,可用于各种语音处理任务,包括多语种语音识别、语音翻译、口语语言识别和语音活动检测,这些任务被表示为一个由解码器预测的token序列,使得单个模型就能完成传统多阶段的语音处理流水线,多任务训练格式使用一组特殊的token用作任务分类。

  • 参考实现:

    url=https://github.com/openai/whisper
    

    输入输出数据

    • encoder输入数据

      输入数据 数据类型 大小 数据排布格式
      mel FLOAT32 batchsize x 80 x mel_len ND
    • encoder输出数据

      输出数据 数据类型 大小 数据排布格式
      n_layer_cross_k FLOAT32 batchsize x mel_len ND
      n_layer_cross_v FLOAT32 6 x batchsize x mel_len x 512 ND
    • decoder输入数据

      输入数据 数据类型 大小 数据排布格式
      tokens INT64 batchsize x n_tokens ND
      in_n_layer_self_k_cache FLOAT32 6 x batchsize x 448 x 512 ND
      in_n_layer_self_v_cache FLOAT32 6 x batchsize x 448 x 512 ND
      n_layer_cross_k FLOAT32 6 x batchsize x mel_len/2 x 512 ND
      n_layer_cross_v FLOAT32 6 x batchsize x mel_len/2 x 512 ND
      offset INT64 1 ND
    • decoder输出数据

      输出数据 数据类型 大小 数据排布格式
      logits FLOAT32 batchsize x mel_len ND
      out_n_layer_self_k_cache FLOAT32 6 x batchsize x mel_len x 512 ND
      out_n_layer_self_v_cache FLOAT32 6 x batchsize x mel_len x 512 ND

推理环境准备

  • 该模型需要以下插件与驱动

    表 1 版本配套表

    配套 版本 环境准备指导
    固件与驱动 24.1.rc3 Pytorch框架推理环境准备
    CANN 8.3.RC1 -
    Python 3.11.10 -
    PyTorch 2.5.1 -
    说明:Atlas 300I Duo 推理卡请以CANN版本选择实际固件与驱动版本。 \ \

快速上手

获取源码

  1. 获取本仓库OM推理代码
   git clone https://gitcode.com/ascend/ModelZoo-PyTorch.git
   cd ModelZoo-PyTorch/ACL_PyTorch/built-in/audio/whisper/whisper_om
  1. 安装依赖
pip install -r requirements.txt
  1. 安装ffmpeg
sudo apt-get install ffmpeg

数据准备

本模型使用一段音频文件作为输入,[测试数据地址](链接: https://pan.baidu.com/s/1xiHW7tmJe3lfAdQABWqsFA?pwd=gya6 提取码: gya6)如下,下载音频文件并存放在项目data目录下:

Whisper
    ├── data      
    	└── test.wav

模型推理

1 模型转换

将模型权重文件.pth转换为.onnx文件,再使用ATC工具将.onnx文件转为离线推理模型.om文件。

  1. 获取权重文件

    下载权重文件,以base.en为例,将下载的模型文件base.en.pt放在项目目录下。

  2. 导出ONNX模型 先给whisper打上patch:

    python3 patch_apply.py
    

    运行pth2onnx.py导出ONNX模型。

    python3 pth2onnx.py --model base.en.pt --output_dir whisper_base_en
    

    原始模型中的配置信息与对应的tokenizer分别保存至model_cfg.josntokens.txt 方便后续om模型推理时能读取对应的信息。由于whisper模型由encoderdecoder组成,且encoderdecoder需要进行Cross Attention操作,所以需要对模型进行修改,从而该脚本将会导出两个ONNX 模型,即encoder.onnxdecoder.onnx

    执行完这一步项目目录如下:

    📁 whisper_om
        ├── pth2onnx.py
        ├── atc.sh
        ├── om_val.py
        ├── patch_apply.py
        ├── whisper_model.patch
        ├── 📁 whisper_base_en
        |   ├── encoder.onnx
        |   ├── decoder.onnx
        |   ├── model_cfg.json
        |   ├── tokens.txt
        ├── requirements.txt
        ├── modelzoo_level.txt
        ├── README.md
        └── LICENSE
    
  3. 使用ATC工具将ONNX模型转为OM模型。

    1. 配置环境变量

      source /usr/local/Ascend/ascend-toolkit/set_env.sh
      

      说明:
      该脚本中环境变量仅供参考,请以实际安装环境配置环境变量。详细介绍请参见《CANN 开发辅助工具指南 (推理)》。

    2. 执行命令查看芯片名称(得到atc命令参数中soc_version

      npu-smi info
      #该设备芯片名为Ascend310P3 (自行替换)
      回显如下:
      +-------------------+-----------------+------------------------------------------------------+
      | NPU     Name      | Health          | Power(W)     Temp(C)           Hugepages-Usage(page) |
      | Chip    Device    | Bus-Id          | AICore(%)    Memory-Usage(MB)                        |
      +===================+=================+======================================================+
      | 0       310P3     | OK              | 15.8         42                0    / 0              |
      | 0       0         | 0000:82:00.0    | 0            1074 / 21534                            |
      +===================+=================+======================================================+
      | 1       310P3     | OK              | 15.4         43                0    / 0              |
      | 0       1         | 0000:89:00.0    | 0            1070 / 21534                            |
      +===================+=================+======================================================+
      
    3. 执行ATC命令
      运行atc.sh导出OM模型。

      #运行脚本示例如下,根据实际环境更改相应的参数,注意onnx文件路径参数不需要".onnx"后缀
      bash atc.sh --encoder_model=path/to/encoder --decoder_model=path/to/decoder --bs=1 --config=path/to/model_cfg.json --output_dir=output --soc=Ascend310P3
      
      #encoder模型实际执行的atc转换命令
      atc --framework=5 --input_format=ND --log=error --soc_version=${soc}
          --model=${encoder_model}.onnx --output=${output_dir}/${encoder_model}_bs${bs} 
          --input_shape="mel:${bs},${n_mels},250~3000"
      
      #decoder模型实际执行的atc转换命令
      atc --framework=5 --input_format=ND --log=error --soc_version=${soc} \
          --model=${decoder_model}.onnx --output=${output_dir}/${decoder_model}_bs${bs} \
          --input_shape="tokens:${bs},1~4;in_n_layer_self_k_cache:${n_text_layer},${bs},${n_text_ctx},${n_text_state}; \
          in_n_layer_self_v_cache:${n_text_layer},${bs},${n_text_ctx},${n_text_state};n_layer_cross_k:${n_text_layer},${bs},250~3000,${n_text_state};n_layer_cross_v:${n_text_layer},${bs},250~3000,${n_text_state};offset:1"
      

      由于音频数据的长度不固定,再转OM模型时使用~来接受范围内的输入。除此之外,用户可以padding频谱长度,将动态值固定在一些档位,转而使用--dynamic_dims参数来接受动态输入。

      • 参数说明 :
        • --model:ONNX模型文件
        • --framework:5代表ONNX模型
        • --output:输出的OM模型
        • --input_shape:输入数据的shape
        • --log:日志级别
        • --soc_version:处理器型号

      运行成功后会生成encoder_bs${bs}_{os}_{arch}.omdecoder_bs${bs}_{os}_{arch}.om模型文件

2 开始推理验证

  1. 安装ais_bench推理工具 请访问ais_bench,根据readme文件进行工具安装,建议使用whl包进行安装。

  2. 执行推理 运行om_val.py推理OM模型,默认为转录任务,模型将输入的语音文件转化为对应的文字。若使用的为多语言模型则可进行翻译任务,将其他语种的语音文件,翻译为英文的语音文件。

    python3 om_val.py --encoder encoder_linux_{arch}.om \
                      --decoder decoder_linux_{arch}.om \
                      --tokens tokens.txt \
                      --model-cfg model_cfg.json \
                      --sound_file data/test.wav
    
  3. 性能验证
    可使用ais_bench推理工具的纯推理模式验证OM模型的性能,参考命令如下:

    python3 -m ais_bench --model encoder_bs${bs}_linux_{arch}.om --loop 10 --dymShape "mel:${bs},80,3000" --outputSize "10000000,10000000"
    
    python3 -m ais_bench --model decoder_bs${bs}_linux_{arch}.om --loop 10 --dymShape "tokens:${bs},1;in_n_layer_self_k_cache:6,${bs},448,512;in_n_layer_self_v_cache:6,${bs},448,512;n_layer_cross_k:6,${bs},351,512;n_layer_cross_v:6,${bs},351,512;offset:1" --outputSize "100000000,100000000,10000000"
    

    模型推理性能&精度

    NPU芯片型号 Batch Size mel_len 数据集 throughout性能(encoder/decoder)
    300I Pro 1 2000 随机数据 90.96/45.70
    300I Pro 1 1000 随机数据 204.26/50.12
    300I Pro 2 2000 随机数据 93.09/77.79
    300I Pro 2 1000 随机数据 252.22/83.18