文件最后提交记录最后更新时间
add indextts model Co-authored-by: mc-zhang<zhangbolun6@huawei.com> # message auto-generated for no-merge-commit merge: !7516 merge master into master add indextts model Created-by: mc-zhang Commit-by: mc-zhang Merged-by: ascend-robot Description: ## Motivation add the moded Index-TTS vLLM ## Modification add the patch file and readme, to describe how to build the project and operate ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!751614 天前
add indextts model Co-authored-by: mc-zhang<zhangbolun6@huawei.com> # message auto-generated for no-merge-commit merge: !7516 merge master into master add indextts model Created-by: mc-zhang Commit-by: mc-zhang Merged-by: ascend-robot Description: ## Motivation add the moded Index-TTS vLLM ## Modification add the patch file and readme, to describe how to build the project and operate ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!751614 天前
README.md

IndexTTS_vLLM-v1.5-推理指导


概述

IndexTTS(全称 Index Text-To-Speech)是由B站团队开发并开源的工业级、零样本、高可控度文本转语音(TTS)模型系列,主打中文场景,核心解决传统 TTS 在音色克隆、情感控制、时长精准匹配三大难题。

  • 版本说明:
    url=https://github.com/Ksuriuri/index-tts-vllm
    branch=IndexTTS-vLLM-1.0
    model_name=index-tts-vllm-v1.5
    

推理环境准备

  • 该模型需要以下插件与驱动
    表 1 版本配套表

    配套 版本 环境准备指导
    固件与驱动 25.3.rc1 Pytorch框架推理环境准备
    CANN 8.3.RC2 -
    Python 3.11 -
    PyTorch 2.7.1 -
    Ascend Extension PyTorch 2.7.1 -
    vLLM 0.11.0 vLLM-Ascend安装指导
    说明:Atlas 800I A2/Atlas 800T A2 卡请以CANN版本选择实际固件与驱动版本。 \ \

快速上手

获取源码

  1. 获取本仓源码

    git clone https://gitcode.com/ascend/ModelZoo-PyTorch.git
    cd ModelZoo-PyTorch/ACL_PyTorch/built-in/audio/IndexTTS-vLLM/
    

安装依赖

  1. 安装依赖

    # 复用vllm基础镜像或参考vllm-ascend社区文档搭建vllm环境:
    docker pull quay.io/ascend/vllm-ascend:v0.11.0
    
  2. 获取模型仓源码

    git clone https://github.com/Ksuriuri/index-tts-vllm
    cd index-tts-vllm
    git checkout 48a06a5df5a8e19adc50afc1179fb788ef05ad6a
    git apply ../diff.patch
    pip3 install -r requirements.txt
    
    #pynini安装:由于开源仓依赖WeTextProcessing库,需要安装opensft,在ARM环境中需要手动编译安装
    # 获取opensft-1.8.3.tar.gz包
    wget https://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.8.3.tar.gz 或登陆网站查看https://www.openfst.org/twiki/bin/view/FST/FstDownload
    tar -xzvf openfst-1.8.3.tar.gz && cd openfst-1.8.3
    
    # 执行编译
    ./configure --enable-far --enable-mpdt --enable-pdt && make -j64 && make install
    # 查看安装路径
    ls -l /usr/local/lib/libfstmpdtscript.so.26
    # 指定路径到环境变量
    export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH && ldconfig
    # pip 安装WeTextProcessing
    
    #当前音频预处理采样已放到npu上,但是npu目前不支持复数求绝对值,需要在torchaudio上手动操作
    pip show torchaudio
    # 进入torchaudio安装路径
    vim functional/functional.py
    
    # 源代码在大约146行
    146 if power == 1.0:
    146    return spec_f.abs()
    #修改为下面代码
    146 if power == 1.0:
    146    device = spec_f.device
    147    spec_f = spec_f.cpu() # 放到cpu计算,再放回npu
    148    return spec_f.abs().to(device)
    
  3. 下载模型权重

    mkdir -p ./checkpoints/Index-TTS-1.5-vLLM
    
    # IndexTTS-1.5
    modelscope download --model kusuriuri/Index-TTS-1.5-vLLM --local_dir ./checkpoints/Index-TTS-1.5-vLLM
    
  4. 完整下载后的文件目录树如下

    |-- DISCLAIMER
    |-- Dockerfile
    |-- INDEX_MODEL_LICENSE
    |-- LICENSE
    |-- README.md
    |-- README_EN.md
    |-- api_server.py
    |-- assets
    |-- checkpoints
    |-- convert_hf_format.py
    |-- convert_hf_format.sh
    |-- docker-compose.yaml
    |-- entrypoint.sh
    |-- indextts
    |-- patch_vllm.py
    |-- requirements.txt
    |-- simple_test.py
    |-- tests
    |-- tools
    `-- webui.py
    

模型推理

  1. 样本测试

    export VLLM_USE_V1=1
    export ASCEND_RT_VISIBLE_DEVICES=0
    unset ASCEND_LAUNCH_BLOCKING=1
    export PYTORCH_NPU_ALLOC_CONF='expandable_segments:True'
    
    python api_server.py --model_dir ./checkpoints/Index-TTS-1.5-vLLM --port 20007
    
    • 参数说明:
      • model_dir: 模型权重路径
      • port: 推理服务端口,默认为6006

    推理脚本以计算单用例音频输出结果为例,推理后将打屏推理结果

  2. 单样本测试推理

    python3 api_example.py
    

模型推理性能&精度

模型 硬件 数据集 batch size 推理耗时(s) 竞品(A100)耗时(s)
index-tts Atlas 800T A2(x86) 10个字 1 0.36 0.35
index-tts Atlas 800T A2(arm) 10个字 1 0.48 0.35