be2e4c5b创建于 12 天前历史提交

文件	最后提交记录	最后更新时间
README.md	Fun-ASR-Nano-2512模型适配 Co-authored-by: liming787<liming787@huawei.com> # message auto-generated for no-merge-commit merge: !7556 merge funasr into master Fun-ASR-Nano-2512模型适配 Created-by: liming787 Commit-by: liming787 Merged-by: ascend-robot Description: ## Motivation 适配Fun-ASR-Nano-2512模型。 ## Modification 提供Fun-ASR-Nano-2512模型适配的源码和readme文件。 ## Self-test (Optional) 自测通过，精度正常 ![ScreenShot_20260520094154.PNG](https://raw.gitcode.com/user-images/assets/8112803/17599369-8af7-454d-b419-1cf7b87fd631/ScreenShot_20260520094154.PNG 'ScreenShot_20260520094154.PNG') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7556	12 天前
decode_npu.py	Fun-ASR-Nano-2512模型适配 Co-authored-by: liming787<liming787@huawei.com> # message auto-generated for no-merge-commit merge: !7556 merge funasr into master Fun-ASR-Nano-2512模型适配 Created-by: liming787 Commit-by: liming787 Merged-by: ascend-robot Description: ## Motivation 适配Fun-ASR-Nano-2512模型。 ## Modification 提供Fun-ASR-Nano-2512模型适配的源码和readme文件。 ## Self-test (Optional) 自测通过，精度正常 ![ScreenShot_20260520094154.PNG](https://raw.gitcode.com/user-images/assets/8112803/17599369-8af7-454d-b419-1cf7b87fd631/ScreenShot_20260520094154.PNG 'ScreenShot_20260520094154.PNG') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7556	12 天前

Fun-ASR-Nano-2512模型-推理指导

概述
推理环境准备
快速上手
模型推理性能
公网地址说明

概述

Fun-ASR-Nano-2512 是由阿里通义实验室（Tongyi Lab）推出的轻量级端到端语音识别（ASR）大模型，属于 FunASR 系列的轻量部署版本，多语言增强版，支持 31 种语言，优化跨语言共享表征，于 2025 年 12 月正式开源发布。该模型专为真实应用场景设计，兼顾小体积、低延迟、易部署与高精度、强鲁棒性，支持完全本地部署，适用于边缘设备与资源受限环境。本文档介绍该模型基于昇腾底座的推理指导。

参考论文：
- Tongyi Fun Team "Fun-ASR Technical Report"

参考实现：

url=https://github.com/FunAudioLLM/Fun-ASR
branch=main
model_name=Fun-ASR-Nano-2512

适配昇腾 AI 处理器的实现：

url=https://gitcode.com/ascend/ModelZoo-PyTorch
branch=master
code_path=ACL_PyTorch/built-in/audio/Fun-ASR-Nano-2512

通过 Git 获取对应代码的方法如下：

git clone {repository_url}        # 克隆仓库代码
cd {repository_name}              # 切换到模型代码仓目录
git checkout {branch/tag}         # 切换到对应分支
git reset --hard {commit_id}      # 代码设置到对应的commit_id（可选）
cd {code_path}                    # 切换到模型代码所在路径，若仓库下只有该模型，则无需切换

输入输出数据

输入数据支持wav、mp3、aac等多种格式的音频文件，输入形式为本地文件路径或公网可访问的待识别文件URL。
输出数据语音识别后的文本，以及文本在音频文件中的起始、结束位置。

推理环境准备

该模型需要以下插件与驱动。

表 1 版本配套表

配套	版本	环境准备指导
固件与驱动	与 CANN 8.5.0 配套版本	Pytorch框架推理环境准备
CANN	8.5.0	-
Python	3.11	-
PyTorch	2.9.0	-
torch_npu	2.9.0	-
硬件	昇腾910	-

快速上手

获取源码

获取开源代码仓。

git clone https://github.com/FunAudioLLM/Fun-ASR.git
cd Fun-ASR

获取适配源码。

上传本项目下的demo_npu.py文件到Fun-ASR目录下

安装依赖。
```
pip install -r requirements.txt
```

准备数据集

准备音频文件并复用测试数据集。

准备音频文件上传到项目根目录，用于推理及性能测试。使用Fun-ASR源码仓中提供的测试数据集用于精度测试。

目录结构请参考：
```
Fun-ASR
├── data
│   ├── val_wav.scp   # 推理输入，原代码仓自带
│   └── val_text.txt  # 对应的标注文本，原代码仓自带
├── demo.wav          # 上传的音频文件
└── decode_npu.py     # 上传的适配代码
```
对数据文件格式的说明参考如下链接：
- https://github.com/FunAudioLLM/Fun-ASR/blob/main/docs/fintune_zh.md

模型推理

执行推理。
```
python decode_npu.py \
  --model_dir=FunAudioLLM/Fun-ASR-Nano-2512 \
  --input_file=demo.wav \
  --loops=1
```
参数说明：
- model_dir：模型名称，或本地磁盘中的模型路径
- input_file：输入待推理文件，.wav等音频文件格式或.scp文件格式
- loops：循环次数，单次推理时设置为1

性能测试。

python decode_npu.py \
  --model_dir=FunAudioLLM/Fun-ASR-Nano-2512 \
  --input_file=demo.wav \
  --loops=10

参数说明：

loops：循环次数，多次推理后输出平均推理时长

精度测试。

a）执行推理脚本，获取推理输出文件

python decode_npu.py \
  --model_dir=FunAudioLLM/Fun-ASR-Nano-2512 \
  --input_file=data/val_wav.scp \
  --output_file=output.txt

参数说明：

output_file：推理输出文件，其中每一行左边为数据唯一ID，右边为该音频文件推理结果

b）对标注和推理结果做文本逆归一化，然后计算WER

python tools/whisper_mix_normalize.py data/val_text.txt data/val_norm.txt
python tools/whisper_mix_normalize.py output.txt output_norm.txt
compute-wer data/val_norm.txt output_norm.txt cer.txt
tail -n8 cer.txt

模型推理性能

性能

Model	Card	RTF
Fun-ASR-Nano-2512	Ascend910B4	0.4

使用如下音频文件，时长3s：

https://modelscope.cn/datasets/FunAudioLLM/funasr-demo/resolve/master/audios/BAC009S0764W0128.wav

精度

Model	数据集	Card	WER%
Fun-ASR-Nano-2512	data/val_wav.scp	Ascend910B4	2.94
Fun-ASR-Nano-2512	data/val_wav.scp	Kunpeng920	2.94

公网地址说明

类型	说明	公网地址
开源代码仓	Fun-ASR源码	https://github.com/FunAudioLLM/Fun-ASR
音频文件	测试用音频文件	https://modelscope.cn/datasets/FunAudioLLM/funasr-demo/resolve/master/audios/BAC009S0764W0128.wav