| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 7 个月前 | ||
| 2 个月前 | ||
| 2 个月前 | ||
| 9 个月前 | ||
| 2 个月前 | ||
| 2 个月前 | ||
| 2 个月前 | ||
| 2 个月前 | ||
| 2 个月前 | ||
| 2 个月前 | ||
| 2 个月前 | ||
| 2 个月前 | ||
| 2 个月前 | ||
| 2 个月前 | ||
| 7 个月前 | ||
| 2 个月前 | ||
| 8 个月前 | ||
| 2 个月前 | ||
| 7 个月前 | ||
| 7 个月前 | ||
| 8 个月前 | ||
| 2 个月前 | ||
| 7 个月前 |
Accelerating CosyVoice with NVIDIA Triton Inference Server and TensorRT-LLM
Contributed by Yuekai Zhang (NVIDIA).
This repository provides three acceleration solutions for CosyVoice, each targeting a different model version and Token2Wav architecture. All solutions use TensorRT-LLM for LLM acceleration and NVIDIA Triton Inference Server for serving.
Solutions
CosyVoice3
Acceleration solution for Fun-CosyVoice3-0.5B-2512, the latest CosyVoice model. The pipeline includes audio_tokenizer, speaker_embedding, token2wav, and vocoder modules managed by Triton, with the LLM served via trtllm-serve.
CosyVoice2 + UNet Token2Wav
The baseline acceleration solution for CosyVoice2, using the original UNet-based flow-matching Token2Wav module.
CosyVoice2 + DiT Token2Wav
Replaces the UNet Token2Wav with a DiT-based Token2Wav module from Step-Audio2. Supports disaggregated deployment where the LLM and Token2Wav run on separate GPUs for better resource utilization under high concurrency.
Quick Start
Each solution can be launched with a single Docker Compose command:
# CosyVoice3
docker compose -f docker-compose.cosyvoice3.yml up
# CosyVoice2 + UNet Token2Wav
docker compose -f docker-compose.cosyvoice2.unet.yml up
# CosyVoice2 + DiT Token2Wav
docker compose -f docker-compose.cosyvoice2.dit.yml up