dd6fe347创建于 4月9日历史提交

文件	最后提交记录	最后更新时间
assets	!6719 [built-in][Pytorch] 调整多模态模型存放目录 Merge pull request !6719 from zhangjunyi08/master	1 年前
dataset	!6719 [built-in][Pytorch] 调整多模态模型存放目录 Merge pull request !6719 from zhangjunyi08/master	1 年前
docker	!6867 【bugfix】huggingface hub版本修改 Merge pull request !6867 from J石页/master	1 年前
docs	fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7517	1 个月前
examples	!6719 [built-in][Pytorch] 调整多模态模型存放目录 Merge pull request !6719 from zhangjunyi08/master	1 年前
opensora	!6719 [built-in][Pytorch] 调整多模态模型存放目录 Merge pull request !6719 from zhangjunyi08/master	1 年前
scripts	!6719 [built-in][Pytorch] 调整多模态模型存放目录 Merge pull request !6719 from zhangjunyi08/master	1 年前
LICENSE	!6719 [built-in][Pytorch] 调整多模态模型存放目录 Merge pull request !6719 from zhangjunyi08/master	1 年前
README.md	fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7517	1 个月前
README_ORG.md	fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7517	1 个月前
public_address_statement.md	fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7517	1 个月前
pyproject.toml	!6871 【bugfix】requirement修改 Merge pull request !6871 from J石页/master	1 年前
requirements_2_4.txt	!6867 【bugfix】huggingface hub版本修改 Merge pull request !6867 from J石页/master	1 年前

OpenSoraPlan1.0 for PyTorch

注意：本仓库OpenSoraPlan1.0模型将不再进行维护，请使用MindSpeed-MM

简介

模型介绍

Open-Sora-Plan是由北大技术团队推出的项目，旨在通过开源框架复现 OpenAI Sora。作为基础开源框架，它支持视频生成模型的训练，包括无条件视频生成、类别视频生成和文本到视频生成。本仓库主要将Open-Sora-Plan多个任务迁移到了昇腾NPU上，并进行极致性能优化。

支持任务列表

本仓已经支持以下模型任务类型

模型	任务列表	是否支持
VideoGPT	训练	✔
LatteT2V	训练	✔
LatteT2V	在线推理	✔

代码实现

参考实现：

url=https://github.com/PKU-YuanGroup/Open-Sora-Plan
commit_id=a7375034586fea20b4aa14bc17c58adbaeeef32f

适配昇腾 AI 处理器的实现：

url=https://gitcode.com/ascend/ModelZoo-PyTorch.git
code_path=PyTorch/built-in/mlm/

准备训练环境

安装模型环境

表 1 三方库版本支持表

三方库	支持版本(PT2.1)	支持版本(PT2.4)
PyTorch	2.1.0	2.4.0
diffusers	0.27.2	0.27.2
accelerate	0.28.0	0.29.3
deepspeed	0.12.6	0.15.3
transformers	4.39.1	4.40.1
decord	0.6.0	0.6.0

在模型根目录下执行以下命令，安装模型对应PyTorch版本需要的依赖。

pip install -e .   # 安装本地OpenSoraPlan代码仓
# 若使用PyTorch 2.4请另外使用requirements_2_4.txt
pip install -r requirements_2_4.txt

注: 模型依赖decord需编译安装，根据原仓安装https://github.com/dmlc/decord

安装昇腾环境

请参考昇腾社区中《Pytorch框架训练环境准备》文档搭建昇腾环境，本仓已支持表2中软件版本。

表 2 昇腾软件版本支持表

软件类型	支持版本
FrameworkPTAdapter	在研版本
CANN	在研版本
昇腾NPU固件	在研版本
昇腾NPU驱动	在研版本

训练数据集准备

用户需自行获取并解压MSRVTT数据集，放置到OpenSoraPlan1.0/dataset目录下。

数据结构如下：

OpenSoraPlan1.0
├── dataset
   ├── MSRVTT
       ├── annotation
       ├── high-quality
       ├── structured-symlinks
       └── video

VideoGPT

训练数据集准备

用户需在以下启动shell脚本中将data_path参数设置为本地数据集的绝对路径。

bash scripts/videogpt/train_videogpt.sh

快速开始

训练任务

本任务主要提供混精bf16一种8卡训练脚本，默认使用DDP分布式训练。

开始训练

进入解压后的源码包根目录。
```
cd /${模型文件夹名称} 
```

运行预训练脚本。

该模型支持单机8卡训练。

单机8卡训练

bash scripts/videogpt/train_videogpt.sh # 8卡训练，混精bf16

模型训练python训练脚本参数说明如下。

bash scripts/videogpt/train_videogpt.sh
--max_steps                         //训练步数
--data_path                         //加载数据集地址
--per_device_train_batch_size       //设置batch_size
--save_strategy                     //保存策略
--learning_rate                     //学习率大小
--lr_scheduler_type                 //学习率策略
--max_train_samples                 //最大训练样本数
--output_dir                        //输出路径
--resolution                        //分辨率
--gradient_accumulation_steps       //梯度累计步数
--save_total_limit                  //保存限制次数
--logging_steps                     //结果打印次数
--downsample                        //下采样率
--n_res_layers                      //残差层数
--embedding_dim                     //嵌入层维度
--n_hiddens                         //注意力头数
--n_codes                           //codebook维度
--sequence_length                   //帧数
--report_to                         //记录方式
--bf16                              //bf16精度模式
--fp16                              //fp16精度模式
--dataloader_num_workers            //设置dataloader workers数量

注：当前模型不支持断点续训，因此暂无相关参数

性能展示

性能

芯片	卡数	单步迭代时间(s/step)	batch_size	帧数	AMP_Type	Torch_Version
竞品A	8p	0.62	1	240	bf16	2.1
Atlas 800T A2	8p	1.00	1	240	bf16	2.1

LatteT2V

训练数据集准备

在源码根目录下进行数据集预处理。

cd OpenSoraPlan1.0/
python dataset/collate_msrvtt_dataset.py -d dataset/MSRVTT -o dataset/msrvtt
python dataset/preprocess_msrvtt.py --data_path dataset/msrvtt/train/annotations.json   # 生成最终的标注csv文件

-d: MSRVTT原数据集的路径； -o: MSRVTT处理后数据集的路径； --data_path: MSRVTT处理后数据集的路径。

处理后数据结构如下：

```
msrvtt
├── train
│   ├── videos
│   │   ├── video0.mp4
│   │   ├── video1.mp4
│   │   └── ...
│   └── annotation.json
├── val
├── test
└── ...

```

准备预训练模型

联网情况下，预训练模型会自动下载。

无网络时，用户可访问huggingface官网自行下载，文件namespace如下：

DeepFloyd/t5-v1_1-xxl               # t5模型
LanguageBind/Open-Sora-Plan-v1.0.0  # 预训练权重(含3D VAE模型和LatteT2V模型)

将下载好的预训练模型放在本工程目录下，组织结构如下：

$OpenSoraPlan1.0
├── DeepFloyd
│   ├── t5-v1_1-xxl
│   │   ├── config.json
│   │   ├── pytorch_model-00001-of-00002.bin
│   │   ├── ...
│   LanguageBind
│   ├── Open-Sora-Plan-v1.0.0
│   │   ├── 17x256x256
│   │   ├── 65x256x256
│   │   ├── 65x512x512
│   │   └── vae

快速开始

训练任务

本任务主要提供混精bf16一种8卡训练脚本，17帧分辨率为256x256的文生视频训练。

开始训练

进入解压后的源码包根目录。
```
cd /${模型文件夹名称} 
```

运行预训练脚本。

该模型支持单机8卡训练。

单机8卡训练

bash scripts/text_condition/train_videoae_17x256x256.sh # 8卡训练，混精bf16

模型训练python训练脚本参数说明如下。

  --config_file scripts/accelerate_configs/deepspeed_zero2_config.yaml \          //deepspeed配置文件
  opensora/train/train_t2v.py \                                                   //训练启动脚本
  --model LatteT2V-XL/122 \                                                       //训练模型
  --text_encoder_name DeepFloyd/t5-v1_1-xxl \                                     //文本编码器
  --dataset t2v \                                                                 //数据集类型
  --ae CausalVAEModel_4x8x8 \                                                     //视频/图片压缩模型
  --ae_path LanguageBind/Open-Sora-Plan-v1.0.0/vae/ \                             //vae预训练文件路径
  --data_path dataset/msrvtt/train/annotations.json \                             //数据集配置文件路径
  --video_folder dataset/msrvtt/train/videos.json \                               //视频文件夹路径
  --sample_rate 1 \                                                               //采样率
  --num_frames 17 \                                                               //训练帧数
  --max_image_size 256 \                                                          //图像/视频最大尺寸
  --gradient_checkpointing \                                                      //是否重计算
  --attention_mode math \                                                         //attention的类型
  --train_batch_size=4 \                                                          //训练的批大小
  --dataloader_num_workers 10 \                                                   //数据处理线程数
  --gradient_accumulation_steps=1 \                                               //梯度累计步数
  --max_train_steps=1000000 \                                                     //最大训练步数
  --learning_rate=2e-05 \                                                         //学习率
  --lr_scheduler="constant" \                                                     //学习率调度策略
  --lr_warmup_steps=0 \                                                           //学习率预热步数
  --mixed_precision="bf16" \                                                      //混精训练的数据类型
  --report_to="tensorboard" \                                                     //记录方式
  --checkpointing_steps=2000 \                                                    //检查点步数
  --output_dir="t2v-f17-256-img4-videovae488-bf16-ckpt-xformers-bs4-lr2e-5-t5" \  //输出的路径
  --allow_tf32 \                                                                  //使用tf32训练
  --use_deepspeed \                                                               //使用deepspeed训练
  --model_max_length 300 \                                                        //文本最大长度
  --use_image_num 4 \                                                             //训练使用图片的数量
  --use_img_from_vid                                                              //训练图片来自视频

性能展示

性能

芯片	卡数	单步迭代时间(s/step)	batch_size	AMP_Type	Torch_Version
GPU	8p	1.84	4	bf16	2.1
Atlas A2	8p	1.95	4	bf16	2.1

在线推理任务

开始推理

进入解压后的源码包根目录。
```
cd /${模型文件夹名称} 
```

运行推理脚本。

该模型支持单卡文生视频在线推理。

执行单卡推理。

bash scripts/text_condition/sample_video.sh

模型在线推理python脚本参数说明如下。

 python opensora/sample/sample_t2v.py \                 //在线推理的Python脚本
 --model_path LanguageBind/Open-Sora-Plan-v1.0.0 \      //LatteT2V预训练权重路径
 --text_encoder_name DeepFloyd/t5-v1_1-xxl \            //文本编码模型权重路径
 --text_prompt examples/prompt_list_0.txt \             //文本提示文件路径
 --ae CausalVAEModel_4x8x8 \                            //视频压缩模型
 --version 65x512x512 \                                 //生成的视频规格
 --save_img_path "./sample_videos/prompt_list_0" \      //生成的视频文件路径
 --fps 24 \                                             //生成视频的帧率
 --guidance_scale 7.5 \                                 //指导尺度
 --num_sampling_steps 250 \                             //采样步数
 --enable_tiling                                        //启用平铺

公网地址说明

代码涉及公网地址参考 public_address_statement.md

变更说明

变更

2024.05.22: VideoGPT bf16训练任务首次发布。

2024.05.30: LatteT2V bf16训练和推理任务首次发布

OpenSoraPlan1.0 for PyTorch

目录

简介

模型介绍

支持任务列表

代码实现

准备训练环境

安装模型环境

安装昇腾环境

训练数据集准备

VideoGPT

训练数据集准备

快速开始

训练任务

开始训练

性能展示

性能

LatteT2V

训练数据集准备

准备预训练模型

快速开始

训练任务

开始训练

性能展示

性能

在线推理任务

开始推理

公网地址说明

变更说明

变更

FAQ