Quick Start
This section uses the Wan2.1 model as an example to demonstrate how to use MindIE SD for text-to-video generation. For more inference details about this model, see Modelers - MindIE.
Before starting inference, complete the environment setup and MindIE SD installation as described in Installation Guide.
Model Download and Execution
1. Obtain the Inference Script
Clone the Wan2.1 inference script repository from Modelers and install dependencies:
git clone https://modelers.cn/MindIE/Wan2.1.git && cd Wan2.1
pip install -r requirements.txt
2. Obtain Model Weights
The repository above contains inference scripts but does not include model weight files. Weights must be downloaded separately. Using Wan2.1 as an example, the following models are supported:
| Model | Description | Weight Download |
|---|---|---|
| Wan2.1-T2V-14B | Text-to-Video | HuggingFace |
| Wan2.1-I2V-14B-480P | Image-to-Video (480P) | HuggingFace |
| Wan2.1-I2V-14B-720P | Image-to-Video (720P) | HuggingFace |
After downloading, the weight directory structure should be as follows (using Wan2.1-T2V-14B as an example):
Wan2.1-T2V-14B/
├── config.json
├── model_index.json
├── models/
│ ├── dit/
│ ├── vae/
│ └── text_encoder/
└── ...
Note
- In addition to HuggingFace, model weights can also be obtained from modelscope.
- For weights of other models (FLUX.1-dev, HunyuanVideo, etc.), see the links in Model/Framework Support Matrix.
3. Run Inference
Set the weight path in the model_base parameter and run the inference script. For detailed parameter explanations, see Parameter Configuration.
# Wan2.1-T2V-14B 8-card inference
cp MindIE-SD/examples/wan/infer_t2v.sh ./
export model_base="/path/to/Wan2.1-T2V-14B"
bash infer_t2v.sh
Acceleration Results
Below, using Wan2.1 as an example, we show the acceleration effects of different features on Atlas 800I A2 inference servers (1*64G) for single-card and multi-card configurations.
Where:
- Cache: Uses the AttentionCache feature;
- TP: Uses the Tensor Parallel feature;
- FA Sparse: Uses the RainFusion feature in FA Sparse;
- CFG: Uses the CFG Parallel feature;
- Ulysses: Uses the Ulysses Parallel acceleration feature. The generated video resolution is HW 832480, with
sample_stepsof 50.
Single-Card Acceleration
Cache Acceleration
| Baseline | + Cache Speedup 1.6 | + Cache Speedup 2.0 | + Cache Speedup 2.4 |
|---|---|---|---|
| 860.2s | 631.7s 1.36x | 541.8s 1.59x | 516.9s *1.66x |
![]() |
![]() |
![]() |
![]() |
Parallel Strategy Results
Dual-Card Single Parallel Strategy
| Model | Cards | Parallel Strategy | Video Output Resolution | Operator Optimization | Cache Optimization | FA Sparse | 50-Step E2E Time(s) | Speedup |
|---|---|---|---|---|---|---|---|---|
| Wan2.1 | 2 | VAE | 832*480 | Yes | Yes | Yes | 548.8 | 1.02x |
| Wan2.1 | 2 | TP | 832*480 | Yes | Yes | Yes | 502.8 | 1.12x |
| Wan2.1 | 2 | CFG | 832*480 | Yes | Yes | Yes | 332.6 | 1.69x |
| Wan2.1 | 2 | Ulysses | 832*480 | Yes | Yes | Yes | 327.6 | *1.71x |
Note: * indicates the best acceleration result.
Multi-Card Combined Parallel Strategies
| Model | Cards | Parallel Strategy | Video Output Resolution | Operator Optimization | Cache Optimization | FA Sparse | 50-Step E2E Time(s) | Speedup |
|---|---|---|---|---|---|---|---|---|
| Wan2.1 | 4 | TP=4, VAE | 832*480 | Yes | Yes | Yes | 204.0 | 2.754x |
| Wan2.1 | 4 | CFG=2, TP=2, VAE | 832*480 | Yes | Yes | Yes | 175.8 | 3.19x |
| Wan2.1 | 4 | Ulysses=4, VAE | 832*480 | Yes | Yes | Yes | 151.1 | 3.71x |
| Wan2.1 | 4 | CFG=2, Ulysses=2, VAE | 832*480 | Yes | Yes | Yes | 147.9 | *3.79x |
| Wan2.1 | 8 | TP=8, VAE | 832*480 | Yes | Yes | Yes | 141.5 | 3.96x |
| Wan2.1 | 8 | CFG=2, TP=4, VAE | 832*480 | Yes | Yes | Yes | 102.9 | 5.45x |
| Wan2.1 | 8 | Ulysses=8, VAE | 832*480 | Yes | Yes | Yes | 78.1 | 7.18x |
| Wan2.1 | 8 | CFG=2, Ulysses=4, VAE | 832*480 | Yes | Yes | Yes | 76.4 | *7.34x |
Note: * indicates the best acceleration result.



