ascend-robot【fix】完善小模型Atlas 300I DUO硬件描述

文件	最后提交记录	最后更新时间
README.md	【fix】完善小模型Atlas 300I DUO硬件描述 Co-authored-by: Niushiya<niushiya1@huawei.com> # message auto-generated for no-merge-commit merge: !7587 merge master into master 【fix】完善小模型Atlas 300I DUO硬件描述 Created-by: niushiya Commit-by: Niushiya Merged-by: ascend-robot Description: ## Motivation 1、完善小模型Atlas 300I DUO硬件描述，补充单芯字段； ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7587	5 天前
diff_Funasr.patch	fix(Paraformer): update funasr diff patch to resolve timestamp mismatch Co-authored-by: zouyizhou<zouyizhou@huawei.com> # message auto-generated for no-merge-commit merge: !7544 merge fixbugs/paraformer into master fix(Paraformer): update funasr diff patch to resolve timestamp mismatch Created-by: zouyizhou Commit-by: zouyizhou Merged-by: ascend-robot Description: ## Motivation 修改funasr diff文件，解决bf16带来的累加误差 ## Modification ## Self-test (Optional) ![image.png](https://raw.gitcode.com/user-images/assets/8112803/688c3b3c-9270-4bc1-8045-5670a958ffe5/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7544	21 天前
infer.py	other files for paraformer Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7473 merge paraformer2 into master other files for paraformer Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation paraformer torchair路线适配 ## Modification 该pr包括适配paraformer除了patch以外的其他文件，以及测试脚本 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7473	4 个月前
requirements.txt	other files for paraformer Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7473 merge paraformer2 into master other files for paraformer Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation paraformer torchair路线适配 ## Modification 该pr包括适配paraformer除了patch以外的其他文件，以及测试脚本 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7473	4 个月前
test_accuracy.py	other files for paraformer Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7473 merge paraformer2 into master other files for paraformer Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation paraformer torchair路线适配 ## Modification 该pr包括适配paraformer除了patch以外的其他文件，以及测试脚本 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7473	4 个月前
test_performance.py	other files for paraformer Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7473 merge paraformer2 into master other files for paraformer Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation paraformer torchair路线适配 ## Modification 该pr包括适配paraformer除了patch以外的其他文件，以及测试脚本 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7473	4 个月前
torchair_auto_model.py	other files for paraformer Co-authored-by: LJ_1998<lijian379@huawei.com> # message auto-generated for no-merge-commit merge: !7473 merge paraformer2 into master other files for paraformer Created-by: LJ_1998 Commit-by: LJ_1998 Merged-by: ascend-robot Description: ## Motivation paraformer torchair路线适配 ## Modification 该pr包括适配paraformer除了patch以外的其他文件，以及测试脚本 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7473	4 个月前

Paraformer(TorchAir)-推理指导

概述
推理环境准备
快速上手
- 获取源码
- 模型推理

概述

Paraformer是阿里达摩院语音团队提出的一种高效的非自回归端到端语音识别框架。本项目为Paraformer中文通用语音识别模型，采用工业级数万小时的标注音频进行模型训练，保证了模型的通用识别效果。模型可以被应用于语音输入法、语音导航、智能会议纪要等场景。

版本说明：

url=https://github.com/modelscope/FunASR
commit_id=c4ac64fd5d24bb3fc8ccc441d36a07c83c8b9015
model_name=paraformer

推理环境准备

该模型需要以下插件与驱动
表 1 版本配套表

配套	版本	环境准备指导
固件与驱动	25.3.rc1	Pytorch框架推理环境准备
CANN	8.3.RC1	-
Python	3.11	-
PyTorch	2.1.0	-
Ascend Extension PyTorch	2.1.0.post13	-
说明：Atlas 800I A2/Atlas 300I Pro 推理卡请以CANN版本选择实际固件与驱动版本。	\	\

快速上手

获取源码

获取本仓源码

git clone https://gitcode.com/ascend/ModelZoo-PyTorch.git
cd ModelZoo-PyTorch/ACL_PyTorch/built-in/audio/Paraformer

安装依赖
```
pip3 install -r requirements.txt
```

获取模型仓源码

git clone https://github.com/modelscope/FunASR.git
cd FunASR
git reset --hard c4ac64fd5d24bb3fc8ccc441d36a07c83c8b9015
git apply --ignore-whitespace ../diff_Funasr.patch
pip3 install -e ./
cd ..

下载模型权重
- Paraformer-large-长音频版
- Paraformer-large-热词版模型
- VAD模型作为前处理模块，切分长语音
- punc模型作为后处理模块，为paraformer转录结果添加标点符号
下载数据集
- AISHELL-1

完整下载后的文件目录树如下

Paraformer
├── FunASR      // 从开源代码仓下载的文件夹
├── data_aishell   // AISHELL-1数据集
├── speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404   // Paraformer热词版模型权重
├── speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch // Paraformer长音频版权重
├── speech_fsmn_vad_zh-cn-16k-common-pytorch // VAD模型权重
├── punc_ct-transformer_cn-en-common-vocab471067-large  // punc模型权重
├── diff_Funasr.patch
├── infer.py           // 本仓库提供的自定义推理脚本
├── test_performance.py  // 本仓库提供的性能测试脚本
├── test_accuracy.py     // 本仓提供的精度测试脚本
├── torchair_auto_model.py
├── README.md
└── requirements.txt

模型推理

样本测试
```
# 热词版
python3 infer.py \
  --model_path=./speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404 \
  --data=./speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/example/asr_example.wav \
  --hotwords="魔搭"
# 长音频版
python3 infer.py \
  --model_path=./speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch \
  --model_vad=./speech_fsmn_vad_zh-cn-16k-common-pytorch \
  --model_punc=./punc_ct-transformer_cn-en-common-vocab471067-large \
  --data=./speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/example/asr_example.wav
```
- 参数说明：
  - model_path: paraformer模型权重路径
  - model_vad: vad模型权重路径，默认为None，表示不使用VAD模型
  - model_punc: punc模型权重路径，默认为None，表示不使用punc模型
  - data: 模型输入文件，默认为长序列paraformer文件中的asr_example.wav
  - hotwords: 语音热词，只有在使用热词版paraformer时该参数才有效，默认为None
  - device: npu芯片id，默认为0
  - batch_size: 模型输入batch size，如果输入数量不是batch_size的整倍数，会在前处理时pad到batch size的整倍数，默认为1
  - warmup: warm up次数
推理脚本以计算单用例音频输出结果为例，推理后将打屏推理结果
性能测试执行以下命令对Paraformer长序列版进行在全量AISHELL-1测试集上的性能测试，该脚本仅针对Paraformer模型进行测试，不包含vad和punc模块，batch_size参数用于控制同时处理的最大音频数量（例如设置为64，则会在sample_path下同时读取64个音频，并组合成一个输入进行处理，若不足会补全）
```
python3 test_performance.py \
--model_path=./speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch \
--batch_size=64 \
--data_path=./path/to/AISHELL-1/wav/test \
--result_path=./aishell_test_result.txt \
--warm_up=3
```
- 参数说明：
  - --model_path: paraformer模型权重路径
  - --batch_size: 模型输入batch size，如果输入数量不是batch_size的整倍数，会在前处理时pad到batch size的整倍数，默认为64
  - data_path: AISHELL-1测试集音频所在路径，测试脚本会递归查找该路径下的所有音频文件
  - --result_path：测试音频的推理结果的保存路径
精度测试该精度测试只针对Paraformer长序列版需要首先执行完性能测试，而后利用性能测试保存到result_path的结果进行精度验证，执行如下命令
```
python3 test_accuracy.py \
--result_path=./aishell_test_result.txt \
--ref_path=/path/to/AISHELL-1/transcript/aishell_transcript_v0.8.txt
```

模型推理性能&精度

模型	硬件	数据集	batch size	paraformer推理性能(转录比)	竞品(A10)性能(转录比)	精度CER	竞品(A10)CER
Paraformer长序列版	Atlas 800I A2	AISHELL-1	64	840	513	0.0198	0.019873
Paraformer长序列版	Atlas 300I DUO(单芯)	AISHELL-1	16	180	513	0.0204	0.019873