Quick Start
This page uses Wan2.1 as an example to show how to run text-to-video inference with MindIE SD. For more model-specific inference details, see Modelers - MindIE/Wan2.1.
Prerequisites
Before running inference, complete the environment preparation and install MindIE SD by following the Installation Guide.
Run inference
Install the model-specific dependencies and then run inference.
Clone the Wan2.1 model repository anywhere, install its requirements, and run the inference script from the MindIE SD workspace. Adjust the weight path as needed, for example /home/{user}/Wan2.1-T2V-14B. Parameter details are documented in parameter_config.md.
git clone https://modelers.cn/MindIE/Wan2.1.git && cd Wan2.1
pip install -r requirements.txt
# 8-card inference for Wan2.1-T2V-14B
cp MindIE-SD/examples/wan/infer_t2v.sh ./
bash infer_t2v.sh --model_base="/home/{user}/Wan2.1-T2V-14B"
Acceleration results
The following Wan2.1 example shows the effect of different acceleration features on an Atlas 800I A2 inference server (1*64G), including both single-card and multi-card runs.
Where:
- Cache refers to the AttentionCache feature.
- TP refers to the Tensor Parallel feature.
- FA sparse refers to the RainFusion optimization under FA sparsity.
- CFG refers to the CFG Parallel feature.
- Ulysses refers to the Ulysses Sequence Parallel feature. The generated video resolution is 832*480 and
sample_stepsis 50.
Single-card acceleration
Cache acceleration
| Baseline | + Cache ratio 1.6 | + Cache ratio 2.0 | + Cache ratio 2.4 |
|---|---|---|---|
| 860.2s | 631.7s 1.36x | 541.8s 1.59x | 516.9s *1.66x |
![]() |
![]() |
![]() |
![]() |
Parallel strategy results
Two-card single-strategy results
| Model | Cards | Parallel strategy | Output resolution | Operator optimization | Cache optimization | FA sparse | 50-step E2E time (s) | Speedup |
|---|---|---|---|---|---|---|---|---|
| Wan2.1 | 2 | VAE | 832*480 | √ | √ | √ | 548.8 | 1.02x |
| Wan2.1 | 2 | TP | 832*480 | √ | √ | √ | 502.8 | 1.12x |
| Wan2.1 | 2 | CFG | 832*480 | √ | √ | √ | 332.6 | 1.69x |
| Wan2.1 | 2 | Ulysses | 832*480 | √ | √ | √ | 327.6 | *1.71x |
Note: * marks the best acceleration result.
Multi-card combined-strategy results
| Model | Cards | Parallel strategy | Output resolution | Operator optimization | Cache optimization | FA sparse | 50-step E2E time (s) | Speedup |
|---|---|---|---|---|---|---|---|---|
| Wan2.1 | 4 | TP=4, VAE | 832*480 | √ | √ | √ | 204.0 | 2.754x |
| Wan2.1 | 4 | CFG=2, TP=2, VAE | 832*480 | √ | √ | √ | 175.8 | 3.19x |
| Wan2.1 | 4 | Ulysses=4, VAE | 832*480 | √ | √ | √ | 151.1 | 3.71x |
| Wan2.1 | 4 | CFG=2, Ulysses=2, VAE | 832*480 | √ | √ | √ | 147.9 | *3.79x |
| Wan2.1 | 8 | TP=8, VAE | 832*480 | √ | √ | √ | 141.5 | 3.96x |
| Wan2.1 | 8 | CFG=2, TP=4, VAE | 832*480 | √ | √ | √ | 102.9 | 5.45x |
| Wan2.1 | 8 | Ulysses=8, VAE | 832*480 | √ | √ | √ | 78.1 | 7.18x |
| Wan2.1 | 8 | CFG=2, Ulysses=4, VAE | 832*480 | √ | √ | √ | 76.4 | *7.34x |
Note: * marks the best acceleration result.



