ad2a1da9创建于 2025年12月3日历史提交

文件	最后提交记录	最后更新时间
README.md	文档整改，gitee->gitcode Co-authored-by: Lighters_c<zyh13227@163.com> # message auto-generated for no-merge-commit merge: !7469 merge ffffix into master 文档整改，gitee->gitcode Created-by: addsubmuldiv Commit-by: Lighters_c Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7469	5 个月前
diff.patch	!7418 新增SenseVoice + vad模型部署指导与相关代码 Merge pull request !7418 from chenzhongnan/master	6 个月前
export_onnx.py	!7418 新增SenseVoice + vad模型部署指导与相关代码 Merge pull request !7418 from chenzhongnan/master	6 个月前
om_infer.py	!7418 新增SenseVoice + vad模型部署指导与相关代码 Merge pull request !7418 from chenzhongnan/master	6 个月前
requirements.txt	!7418 新增SenseVoice + vad模型部署指导与相关代码 Merge pull request !7418 from chenzhongnan/master	6 个月前

SenseVoice + Vad 推理指导

SenseVoice + Vad 推理指导

概述

本文档参考SenseVoice(ONNX)-推理指导，新增vad语音端点检测模型，用于检测音频中有效的语音片段，并支持输出timestamp（每个识别词对应音频中的时间）

版本说明：

url=https://github.com/modelscope/FunASR
commit_id=c4ac64fd5d24bb3fc8ccc441d36a07c83c8b9015

推理环境准备

表 1 版本配套表

配套	版本	环境准备指导
固件与驱动	25.0.RC1	Pytorch框架推理环境准备
CANN	8.2.RC1	-
Python	3.11.10	-
PyTorch	2.1.0	-
说明：支持Atlas 300I Duo	\	\

获取源码

获取本仓源码

git clone https://gitcode.com/ascend/ModelZoo-PyTorch.git
cd ModelZoo-PyTorch/ACL_PyTorch/contrib/audio/SenseVoice

安装依赖

pip3 install -r requirements.txt

获取 Pytorch源码

git clone https://github.com/modelscope/FunASR.git
cd FunASR
git reset c4ac64fd5d24bb3fc8ccc441d36a07c83c8b9015 --hard
git apply ../diff.patch
cp ../export_onnx.py ./
cp ../om_infer.py ./

安装aisbench工具参考aisbench安装aisbench工具
安装msit工具参考msit安装工具中surgeon组件。
获取权重

数据集准备

librispeech_asr_dummy数据集下载地址

模型推理

模型转换

导出onnx模型

python3 export_onnx.py --model /path/to/SenseVoiceSmall

参数说明
--model SenseVoiceSmall模型路径

脚本运行后会在权重目录下生成model.onnx文件

修改onnx模型

# ${ModelZoo-PyTorch} 为modelzoo代码所在路径
cp ${ModelZoo-PyTorch}/ACL_PyTorch/built-in/audio/SenseVoice/modify_onnx.py ./
python3 modify_onnx.py \
--input_path=/path/to/SenseVoiceSmall/model.onnx \
--save_path=/path/to/SenseVoiceSmall/model_md.onnx

修改原始onnx模型。删除多余的domian，生成新的model_md.onnx模型

使用 ATC工具将 ONNX模型转为 OM模型

配置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

执行ATC命令，利用npu-smi info命令获取芯片型号,填入soc_version参数中

atc --framework=5 --soc_version=Ascend${soc_version} --model /path/to/SenseVoiceSmall/model_md.onnx --output SenseVoice --input_shape="speech:1,-1,560;speech_lengths:1;language:1;textnorm:1"

在当前目录下生成动态模型SenseVoice_{arch}.om

目录结构

📁 SenseVoice/
├──📄 diff.patch
├──📄 export_onnx.py
├──📄 modify_onnx.py
├──📄 requirements.txt
├── 📁Funasr/
|   |── 📄 ...
|   |── 📁 funasr/
|   |── 📁 speech_fsmn_vad_zh-cn-16k-common-pytorch/
|   |── 📁 SenseVoiceSmall /
|   |── 📁 librispeech_asr_dummy
|   |── 📄 senseVoice_{arch}.om
|   |── 📄 om_infer.py

执行推理命令

python3 om_infer.py \
--vad_path speech_fsmn_vad_zh-cn-16k-common-pytorch \
--model_path SenseVoiceSmall \
--om_path SenseVoice_{arch}.om \
--device 0 \
--input ./librispeech_asr_dummy/validation-00000-of-00001.parquet \
--output_timestamp

参数说明
vad_path: vad模型权重路径
model_path: SenseVoice模型权重路径
om_model: om模型路径
device: npu芯片id，默认使用0卡
input: librispeech_asr_dummy数据集文件的路径
output_timestamp 输出时间戳

执行后会打印在该数据集下的转录比和WER率

性能数据

模型	数据集	芯片	转录比	T4转录比
SenseVoice + Vad	librispeech_asr_dummy	300I Duo单芯	23	35

精度数据

模型	数据集	芯片	WER	T4 WER
SenseVoice + Vad	librispeech_asr_dummy	300I Duo单芯	0.083	0.083