Rrootfix lint

914454e3创建于 3月16日历史提交

文件	最后提交记录	最后更新时间
model_repo	Fix: generate token2wav_request_id from cosyvoice2 - Since all token2wav requests within a single cosyvoice2 request must share the same request_id, modify the logic so that a new request_id is generated only if it does not already exist, and ensure that the same request_id is sent consistently.	7 个月前
model_repo_cosyvoice3	add cosyvoice3	2 个月前
scripts	fix lint	2 个月前
Dockerfile.server	add contributor info	9 个月前
README.Cosyvoice2.DiT.md	rename files	2 个月前
README.Cosyvoice2.Unet.md	rename files	2 个月前
README.Cosyvoice3.md	update results	2 个月前
README.md	update results	2 个月前
client_grpc.py	add cosyvoice3	2 个月前
client_http.py	add cosyvoice3	2 个月前
docker-compose.cosyvoice2.dit.yml	rename files	2 个月前
docker-compose.cosyvoice2.unet.yml	rename files	2 个月前
docker-compose.cosyvoice3.yml	update results	2 个月前
infer_cosyvoice3.py	add cosyvoice3	2 个月前
offline_inference.py	fix lint	7 个月前
requirements.txt	rename files	2 个月前
run.sh	add cosyvoice2 offline inference	8 个月前
run_cosyvoice3.sh	update results	2 个月前
run_stepaudio2_dit_token2wav.sh	add disaggregated deployment	7 个月前
streaming_inference.py	fix lint	7 个月前
token2wav.py	fix lint	8 个月前
token2wav_cosyvoice3.py	add cosyvoice3	2 个月前
token2wav_dit.py	add dit results	7 个月前

Accelerating CosyVoice with NVIDIA Triton Inference Server and TensorRT-LLM

Contributed by Yuekai Zhang (NVIDIA).

This repository provides three acceleration solutions for CosyVoice, each targeting a different model version and Token2Wav architecture. All solutions use TensorRT-LLM for LLM acceleration and NVIDIA Triton Inference Server for serving.

Solutions

CosyVoice3

Acceleration solution for Fun-CosyVoice3-0.5B-2512, the latest CosyVoice model. The pipeline includes audio_tokenizer, speaker_embedding, token2wav, and vocoder modules managed by Triton, with the LLM served via trtllm-serve.

CosyVoice2 + UNet Token2Wav

The baseline acceleration solution for CosyVoice2, using the original UNet-based flow-matching Token2Wav module.

CosyVoice2 + DiT Token2Wav

Replaces the UNet Token2Wav with a DiT-based Token2Wav module from Step-Audio2. Supports disaggregated deployment where the LLM and Token2Wav run on separate GPUs for better resource utilization under high concurrency.

Quick Start

Each solution can be launched with a single Docker Compose command:

# CosyVoice3
docker compose -f docker-compose.cosyvoice3.yml up

# CosyVoice2 + UNet Token2Wav
docker compose -f docker-compose.cosyvoice2.unet.yml up

# CosyVoice2 + DiT Token2Wav
docker compose -f docker-compose.cosyvoice2.dit.yml up