文件最后提交记录最后更新时间
Index-TTS-vLLM-v1模型适配代码合入 Co-authored-by: qiupei1<qiupei3@huawei.com> # message auto-generated for no-merge-commit merge: !7564 merge master into master Index-TTS-vLLM-v1模型适配代码合入 Created-by: qiupei1 Commit-by: qiupei1 Merged-by: ascend-robot Description: ## Motivation Index-TTS-vLLM-v1模型适配代码合入,适用于A2 + index-tts-vllm(1.0版本的模型) ## Modification 包含所有适配的代码,已经打成patch,以及使用文档, ## Self-test (Optional) 自测通过,精度正常 ![image.png](https://raw.gitcode.com/user-images/assets/8112803/3016746c-c1ed-4292-853a-81ad4085751d/image.png 'image.png') ## BC-breaking (Optional) 不涉及 ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!756413 天前
Index-TTS-vLLM-v1模型适配代码合入 Co-authored-by: qiupei1<qiupei3@huawei.com> # message auto-generated for no-merge-commit merge: !7564 merge master into master Index-TTS-vLLM-v1模型适配代码合入 Created-by: qiupei1 Commit-by: qiupei1 Merged-by: ascend-robot Description: ## Motivation Index-TTS-vLLM-v1模型适配代码合入,适用于A2 + index-tts-vllm(1.0版本的模型) ## Modification 包含所有适配的代码,已经打成patch,以及使用文档, ## Self-test (Optional) 自测通过,精度正常 ![image.png](https://raw.gitcode.com/user-images/assets/8112803/3016746c-c1ed-4292-853a-81ad4085751d/image.png 'image.png') ## BC-breaking (Optional) 不涉及 ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!756413 天前
Index-TTS-vLLM-v1模型适配代码合入 Co-authored-by: qiupei1<qiupei3@huawei.com> # message auto-generated for no-merge-commit merge: !7564 merge master into master Index-TTS-vLLM-v1模型适配代码合入 Created-by: qiupei1 Commit-by: qiupei1 Merged-by: ascend-robot Description: ## Motivation Index-TTS-vLLM-v1模型适配代码合入,适用于A2 + index-tts-vllm(1.0版本的模型) ## Modification 包含所有适配的代码,已经打成patch,以及使用文档, ## Self-test (Optional) 自测通过,精度正常 ![image.png](https://raw.gitcode.com/user-images/assets/8112803/3016746c-c1ed-4292-853a-81ad4085751d/image.png 'image.png') ## BC-breaking (Optional) 不涉及 ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!756413 天前
README.md

Index-TTS-vLLM-v1 NPU 推理适配指导


概述

  本项目在 index-tts-vllm(v1模型版本) 的基础上适配昇腾NPU,使其能够在 Ascend NPU 环境下使用 vllm-ascend 加速 GPT 模型推理,通过 FastAPI 提供 TTS 合成服务。

  • 版本说明:
    url=https://github.com/Ksuriuri/index-tts-vllm
    commit_id=3f1c7e9
    model_name=Index-TTS-v1.0
    

推理环境准备

  • 该模型需要以下插件与驱动

    表 1 版本配套表

    配套 版本
    固件与驱动 25.5.1+
    CANN 8.5.1
    Python 3.11.14
    PyTorch / torch_npu 2.9.0
    vllm 0.18.0
    vllm-ascend 0.18.0rc1
    torchaudio 2.9.0
    soundfile 0.13.1

    说明:Atlas 800I A2 推理卡请以 CANN 版本选择实际固件与驱动版本。

快速上手

获取源码

  1. 获取 index-tts-vllm 源码

    git clone https://github.com/Ksuriuri/index-tts-vllm.git
    cd index-tts-vllm
    
  2. 获取本仓适配补丁

    git clone https://gitcode.com/Ascend/ModelZoo-PyTorch.git
    cp ModelZoo-PyTorch/ACL_PyTorch/built-in/audio/Index-TTS-vLLM-v1/diff_index_tts_vllm.patch .
    cp ModelZoo-PyTorch/ACL_PyTorch/built-in/audio/Index-TTS-vLLM-v1/diff_torchaudio_kaldi.patch .
    

应用适配补丁

1. 应用 index-tts-vllm 主仓补丁

cd index-tts-vllm
git apply ../diff_index_tts_vllm.patch

2. 应用 torchaudio 补丁(如需要)

NPU 不支持 torch.fft.rfft() 返回复数张量直接调用 .abs(),需要将复数取模运算手动拆分为实部虚部计算。

# 找到 torchaudio 安装路径
TORCHAUDIO_PATH=$(python3 -c "import torchaudio; import os; print(os.path.dirname(torchaudio.__file__))")
# 应用补丁
cd ${TORCHAUDIO_PATH}
patch -p1 < /path/to/diff_torchaudio_kaldi.patch

如 patch 命令不可用,也可手动修改 ${TORCHAUDIO_PATH}/compliance/kaldi.py 第616行:

# 原始代码:
spectrum = torch.fft.rfft(strided_input).abs()

# 修改为:
#spectrum = torch.fft.rfft(strided_input).abs()
spectrum = torch.fft.rfft(strided_input)
real_view = torch.view_as_real(spectrum)
spectrum = torch.sqrt(real_view[...,0].pow(2) + real_view[...,1].pow(2))

3. 卸载 torchcodec(如已安装)

torchaudio 2.9 默认使用 torchcodec 后端加载音频,而 torchcodec 依赖 CUDA 库(libnvrtc.so),在 NPU 环境下无法运行。本项目已通过 soundfile 替代,但仍建议卸载避免冲突:

pip uninstall torchcodec -y

4. 安装依赖

本项目的适配修改不需要安装额外的 Python 依赖,NPU 环境已预装 vllm-ascend 及相关依赖。如需 soundfile 支持:

pip install soundfile>=0.13.1

下载模型权重

From ModelScope(国内推荐):

modelscope download --model kusuriuri/Index-TTS-vLLM --local_dir ./checkpoints/Index-TTS-vLLM

启动API服务

python3 api_server.py \
    --model_dir ./checkpoints/Index-TTS-vLLM \
    --host 0.0.0.0 \
    --port 6006 \
    --gpu_memory_utilization 0.25

启动参数说明

参数 说明 默认值
--model_dir 模型权重路径 checkpoints/Index-TTS-vLLM
--host 服务监听地址 0.0.0.0
--port 服务监听端口 6006
--gpu_memory_utilization vllm 显存占用率 0.25

调用推理接口

服务启动成功后,可以调用如下代码进行推理验证(注意api_example.py里面的端口要修改正确):

python api_eaxpmle.py

适配修改说明

本项目对原始 index-tts-vllm (v1版本模型)做了以下 NPU 适配修改:

1. indextts/infer_vllm.py

① 替换 torchaudio.load 为 soundfile 后端

torchaudio 2.9 默认使用 torchcodec 加载音频,torchcodec 依赖 CUDA 库无法在 NPU 运行。通过 monkey-patch 替换为 soundfile 实现:

import soundfile as sf

def _load_audio_soundfile(uri, frame_offset=0, num_frames=-1, normalize=True, channels_first=True, **kwargs):
    wav_np, sr = sf.read(uri, dtype='float32')
    wav = torch.from_numpy(wav_np)
    if wav.dim() == 1:
        wav = wav.unsqueeze(0)
    else:
        wav = wav.T
    if frame_offset > 0:
        wav = wav[:, frame_offset:]
    if num_frames > 0:
        wav = wav[:, :num_frames]
    return wav, sr

torchaudio.load = _load_audio_soundfile

② 设备检测增加 NPU 分支

在 CUDA 检测之前增加 NPU 检测,NPU 上禁用 BigVGAN 的 CUDA 自定义核:

elif hasattr(torch, "npu") and torch.npu.is_available():
    self.device = "npu:0"
    self.is_fp16 = is_fp16
    self.use_cuda_kernel = False

③ torch.cuda.empty_cache() 兼容 NPU

if hasattr(torch, "npu") and torch.npu.is_available():
    torch.npu.empty_cache()
elif torch.cuda.is_available():
    torch.cuda.empty_cache()

2. patch_vllm.py

wrapper 模式 + patch NPUModelRunner,先调用原版方法,再追加 GPT2TTSModel 专属的 position_ids 偏移逻辑,最后同步到 GPU。好处:兼容 vllm-ascend 的内部实现变更,升级 vllm 版本时不会因内部逻辑变化而崩溃。

from vllm_ascend.worker.model_runner_v1 import NPUModelRunner

_original_prepare_inputs = NPUModelRunner._prepare_inputs

def _prepare_inputs(self, scheduler_output, num_scheduled_tokens):
    result = _original_prepare_inputs(self, scheduler_output, num_scheduled_tokens)

    model = self.get_model()
    if isinstance(model, GPT2TTSModel):
        # 追加 position_ids 偏移逻辑
        total_num_scheduled_tokens = scheduler_output.total_num_scheduled_tokens
        num_reqs = self.input_batch.num_reqs
        req_indices = np.repeat(self.arange_np[:num_reqs], num_scheduled_tokens)

        prompt_tokens_offset = []
        for req_id in self.input_batch.req_ids:
            prompt_tokens_offset.append(-(len(self.requests[req_id].prompt_token_ids) - 1))
        positions_np = self.positions.np[:total_num_scheduled_tokens]
        np.add(np.array(prompt_tokens_offset)[req_indices],
                positions_np,
                out=positions_np)
        self.positions.copy_to_gpu(total_num_scheduled_tokens)

    return result

NPUModelRunner._prepare_inputs = _prepare_inputs

3. api_server.py

默认 model_dir 路径改为 checkpoints/Index-TTS-vLLM

4. torchaudio/compliance/kaldi.py

问题原因:NPU 不支持 torch.fft.rfft() 返回的复数张量直接调用 .abs() 操作。

修改内容:将 .abs() 拆分为手动复数取模运算:

# 原始:
spectrum = torch.fft.rfft(strided_input).abs()

# 修改为:
spectrum = torch.fft.rfft(strided_input)
real_view = torch.view_as_real(spectrum)
spectrum = torch.sqrt(real_view[...,0].pow(2) + real_view[...,1].pow(2))

性能

RTF(Real-Time Factor,实时率)= 推理耗时 / 生成音频时长,衡量合成速度。RTF < 1 表示合成速度优于实时,值越小性能越好。

硬件 生成音频时长 推理耗时 RTF
800I A2 2.90s 1.62s 0.56