f8cc90b4创建于 13 天前历史提交

文件	最后提交记录	最后更新时间
README.md	Index-TTS-vLLM-v1模型适配代码合入 Co-authored-by: qiupei1<qiupei3@huawei.com> # message auto-generated for no-merge-commit merge: !7564 merge master into master Index-TTS-vLLM-v1模型适配代码合入 Created-by: qiupei1 Commit-by: qiupei1 Merged-by: ascend-robot Description: ## Motivation Index-TTS-vLLM-v1模型适配代码合入，适用于A2 + index-tts-vllm(1.0版本的模型) ## Modification 包含所有适配的代码，已经打成patch，以及使用文档， ## Self-test (Optional) 自测通过，精度正常 ![image.png](https://raw.gitcode.com/user-images/assets/8112803/3016746c-c1ed-4292-853a-81ad4085751d/image.png 'image.png') ## BC-breaking (Optional) 不涉及 ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7564	13 天前
diff_index_tts_vllm.patch	Index-TTS-vLLM-v1模型适配代码合入 Co-authored-by: qiupei1<qiupei3@huawei.com> # message auto-generated for no-merge-commit merge: !7564 merge master into master Index-TTS-vLLM-v1模型适配代码合入 Created-by: qiupei1 Commit-by: qiupei1 Merged-by: ascend-robot Description: ## Motivation Index-TTS-vLLM-v1模型适配代码合入，适用于A2 + index-tts-vllm(1.0版本的模型) ## Modification 包含所有适配的代码，已经打成patch，以及使用文档， ## Self-test (Optional) 自测通过，精度正常 ![image.png](https://raw.gitcode.com/user-images/assets/8112803/3016746c-c1ed-4292-853a-81ad4085751d/image.png 'image.png') ## BC-breaking (Optional) 不涉及 ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7564	13 天前
diff_torchaudio_kaldi.patch	Index-TTS-vLLM-v1模型适配代码合入 Co-authored-by: qiupei1<qiupei3@huawei.com> # message auto-generated for no-merge-commit merge: !7564 merge master into master Index-TTS-vLLM-v1模型适配代码合入 Created-by: qiupei1 Commit-by: qiupei1 Merged-by: ascend-robot Description: ## Motivation Index-TTS-vLLM-v1模型适配代码合入，适用于A2 + index-tts-vllm(1.0版本的模型) ## Modification 包含所有适配的代码，已经打成patch，以及使用文档， ## Self-test (Optional) 自测通过，精度正常 ![image.png](https://raw.gitcode.com/user-images/assets/8112803/3016746c-c1ed-4292-853a-81ad4085751d/image.png 'image.png') ## BC-breaking (Optional) 不涉及 ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7564	13 天前

Index-TTS-vLLM-v1 NPU 推理适配指导

概述
推理环境准备
快速上手
适配修改说明
性能

概述

本项目在 index-tts-vllm(v1模型版本) 的基础上适配昇腾NPU，使其能够在 Ascend NPU 环境下使用 vllm-ascend 加速 GPT 模型推理，通过 FastAPI 提供 TTS 合成服务。

版本说明：

url=https://github.com/Ksuriuri/index-tts-vllm
commit_id=3f1c7e9
model_name=Index-TTS-v1.0

推理环境准备

该模型需要以下插件与驱动

表 1 版本配套表

配套版本

固件与驱动 25.5.1+

CANN 8.5.1

Python 3.11.14

PyTorch / torch_npu 2.9.0

vllm 0.18.0

vllm-ascend 0.18.0rc1

torchaudio 2.9.0

soundfile 0.13.1

说明：Atlas 800I A2 推理卡请以 CANN 版本选择实际固件与驱动版本。

配套	版本
固件与驱动	25.5.1+
CANN	8.5.1
Python	3.11.14
PyTorch / torch_npu	2.9.0
vllm	0.18.0
vllm-ascend	0.18.0rc1
torchaudio	2.9.0
soundfile	0.13.1

快速上手

获取源码

获取 index-tts-vllm 源码

git clone https://github.com/Ksuriuri/index-tts-vllm.git
cd index-tts-vllm

获取本仓适配补丁

git clone https://gitcode.com/Ascend/ModelZoo-PyTorch.git
cp ModelZoo-PyTorch/ACL_PyTorch/built-in/audio/Index-TTS-vLLM-v1/diff_index_tts_vllm.patch .
cp ModelZoo-PyTorch/ACL_PyTorch/built-in/audio/Index-TTS-vLLM-v1/diff_torchaudio_kaldi.patch .

应用适配补丁

1. 应用 index-tts-vllm 主仓补丁

cd index-tts-vllm
git apply ../diff_index_tts_vllm.patch

2. 应用 torchaudio 补丁（如需要）

NPU 不支持 torch.fft.rfft() 返回复数张量直接调用 .abs()，需要将复数取模运算手动拆分为实部虚部计算。

# 找到 torchaudio 安装路径
TORCHAUDIO_PATH=$(python3 -c "import torchaudio; import os; print(os.path.dirname(torchaudio.__file__))")
# 应用补丁
cd ${TORCHAUDIO_PATH}
patch -p1 < /path/to/diff_torchaudio_kaldi.patch

如 patch 命令不可用，也可手动修改 ${TORCHAUDIO_PATH}/compliance/kaldi.py 第616行：

# 原始代码：
spectrum = torch.fft.rfft(strided_input).abs()

# 修改为：
#spectrum = torch.fft.rfft(strided_input).abs()
spectrum = torch.fft.rfft(strided_input)
real_view = torch.view_as_real(spectrum)
spectrum = torch.sqrt(real_view[...,0].pow(2) + real_view[...,1].pow(2))

3. 卸载 torchcodec（如已安装）

torchaudio 2.9 默认使用 torchcodec 后端加载音频，而 torchcodec 依赖 CUDA 库（libnvrtc.so），在 NPU 环境下无法运行。本项目已通过 soundfile 替代，但仍建议卸载避免冲突：

pip uninstall torchcodec -y

4. 安装依赖

本项目的适配修改不需要安装额外的 Python 依赖，NPU 环境已预装 vllm-ascend 及相关依赖。如需 soundfile 支持：

pip install soundfile>=0.13.1

下载模型权重

From ModelScope（国内推荐）：

modelscope download --model kusuriuri/Index-TTS-vLLM --local_dir ./checkpoints/Index-TTS-vLLM

启动API服务

python3 api_server.py \
    --model_dir ./checkpoints/Index-TTS-vLLM \
    --host 0.0.0.0 \
    --port 6006 \
    --gpu_memory_utilization 0.25

启动参数说明

参数	说明	默认值
`--model_dir`	模型权重路径	`checkpoints/Index-TTS-vLLM`
`--host`	服务监听地址	`0.0.0.0`
`--port`	服务监听端口	`6006`
`--gpu_memory_utilization`	vllm 显存占用率	`0.25`

调用推理接口

服务启动成功后，可以调用如下代码进行推理验证（注意api_example.py里面的端口要修改正确）：

python api_eaxpmle.py

适配修改说明

本项目对原始 index-tts-vllm （v1版本模型）做了以下 NPU 适配修改：

1. indextts/infer_vllm.py

① 替换 torchaudio.load 为 soundfile 后端

torchaudio 2.9 默认使用 torchcodec 加载音频，torchcodec 依赖 CUDA 库无法在 NPU 运行。通过 monkey-patch 替换为 soundfile 实现：

import soundfile as sf

def _load_audio_soundfile(uri, frame_offset=0, num_frames=-1, normalize=True, channels_first=True, **kwargs):
    wav_np, sr = sf.read(uri, dtype='float32')
    wav = torch.from_numpy(wav_np)
    if wav.dim() == 1:
        wav = wav.unsqueeze(0)
    else:
        wav = wav.T
    if frame_offset > 0:
        wav = wav[:, frame_offset:]
    if num_frames > 0:
        wav = wav[:, :num_frames]
    return wav, sr

torchaudio.load = _load_audio_soundfile

② 设备检测增加 NPU 分支

在 CUDA 检测之前增加 NPU 检测，NPU 上禁用 BigVGAN 的 CUDA 自定义核：

elif hasattr(torch, "npu") and torch.npu.is_available():
    self.device = "npu:0"
    self.is_fp16 = is_fp16
    self.use_cuda_kernel = False

③ torch.cuda.empty_cache() 兼容 NPU

if hasattr(torch, "npu") and torch.npu.is_available():
    torch.npu.empty_cache()
elif torch.cuda.is_available():
    torch.cuda.empty_cache()

2. patch_vllm.py

wrapper 模式 + patch NPUModelRunner，先调用原版方法，再追加 GPT2TTSModel 专属的 position_ids 偏移逻辑，最后同步到 GPU。好处：兼容 vllm-ascend 的内部实现变更，升级 vllm 版本时不会因内部逻辑变化而崩溃。

from vllm_ascend.worker.model_runner_v1 import NPUModelRunner

_original_prepare_inputs = NPUModelRunner._prepare_inputs

def _prepare_inputs(self, scheduler_output, num_scheduled_tokens):
    result = _original_prepare_inputs(self, scheduler_output, num_scheduled_tokens)

    model = self.get_model()
    if isinstance(model, GPT2TTSModel):
        # 追加 position_ids 偏移逻辑
        total_num_scheduled_tokens = scheduler_output.total_num_scheduled_tokens
        num_reqs = self.input_batch.num_reqs
        req_indices = np.repeat(self.arange_np[:num_reqs], num_scheduled_tokens)

        prompt_tokens_offset = []
        for req_id in self.input_batch.req_ids:
            prompt_tokens_offset.append(-(len(self.requests[req_id].prompt_token_ids) - 1))
        positions_np = self.positions.np[:total_num_scheduled_tokens]
        np.add(np.array(prompt_tokens_offset)[req_indices],
                positions_np,
                out=positions_np)
        self.positions.copy_to_gpu(total_num_scheduled_tokens)

    return result

NPUModelRunner._prepare_inputs = _prepare_inputs

3. api_server.py

默认 model_dir 路径改为 checkpoints/Index-TTS-vLLM。

4. torchaudio/compliance/kaldi.py

问题原因：NPU 不支持 torch.fft.rfft() 返回的复数张量直接调用 .abs() 操作。

修改内容：将 .abs() 拆分为手动复数取模运算：

# 原始：
spectrum = torch.fft.rfft(strided_input).abs()

# 修改为：
spectrum = torch.fft.rfft(strided_input)
real_view = torch.view_as_real(spectrum)
spectrum = torch.sqrt(real_view[...,0].pow(2) + real_view[...,1].pow(2))

性能

RTF（Real-Time Factor，实时率）= 推理耗时 / 生成音频时长，衡量合成速度。RTF < 1 表示合成速度优于实时，值越小性能越好。

硬件	生成音频时长	推理耗时	RTF
800I A2	2.90s	1.62s	0.56