文件最后提交记录最后更新时间
!6867 【bugfix】huggingface hub版本修改 Merge pull request !6867 from J石页/master 1 年前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
!6719 [built-in][Pytorch] 调整多模态模型存放目录 Merge pull request !6719 from zhangjunyi08/master 1 年前
!6719 [built-in][Pytorch] 调整多模态模型存放目录 Merge pull request !6719 from zhangjunyi08/master 1 年前
!6719 [built-in][Pytorch] 调整多模态模型存放目录 Merge pull request !6719 from zhangjunyi08/master 1 年前
!6719 [built-in][Pytorch] 调整多模态模型存放目录 Merge pull request !6719 from zhangjunyi08/master 1 年前
文档整改,gitee->gitcode Co-authored-by: Lighters_c<zyh13227@163.com> # message auto-generated for no-merge-commit merge: !7469 merge ffffix into master 文档整改,gitee->gitcode Created-by: addsubmuldiv Commit-by: Lighters_c Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!74695 个月前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
!6719 [built-in][Pytorch] 调整多模态模型存放目录 Merge pull request !6719 from zhangjunyi08/master 1 年前
!6867 【bugfix】huggingface hub版本修改 Merge pull request !6867 from J石页/master 1 年前
README.md

OpenSoraPlan1.1 for PyTorch

注意: 本仓库OpenSoraPlan1.1模型将不再进行维护,请使用MindSpeed-MM

目录

简介

模型介绍

Open-Sora-Plan是由北大技术团队推出的项目,旨在通过开源框架复现 OpenAI Sora。作为基础开源框架,它支持视频生成模型的训练,包括无条件视频生成、类别视频生成和文本到视频生成。 本仓库主要将Open-Sora-Plan多个任务迁移到了昇腾NPU上,并进行极致性能优化。

支持任务列表

本仓已经支持以下模型任务类型。

模型 任务列表 是否支持
LatteT2V 单机训练
LatteT2V 多机训练
LatteT2V 在线推理

代码实现

  • 参考实现:

    url=https://github.com/PKU-YuanGroup/Open-Sora-Plan
    commit_id=2a8b2328a5fcc0108fb5444b010f7e1ae0b4cb7b
    
  • 适配昇腾 AI 处理器的实现:

    url=https://gitcode.com/ascend/ModelZoo-PyTorch.git
    code_path=PyTorch/built-in/mlm/
    

准备训练环境

安装模型环境

表 1 三方库版本支持表

三方库 支持版本(PT2.1) 支持版本(PT2.4)
PyTorch 2.1.0 2.4.0
diffusers 0.27.2 0.27.2
accelerate 0.28.0 0.29.3
deepspeed 0.12.6 0.15.3
transformers 4.39.1 4.40.1
decord 0.6.0 0.6.0
  1. 安装decord。

    • X86等架构环境可以在步骤3自动安装。
    • 对arm等架构,如无法自动获取,需要参考源仓README从源码安装。
  2. 安装mindspeed。

    • 联网条件下,可以直接克隆源仓。
      git clone https://gitcode.com/ascend/MindSpeed.git
      pip install -e MindSpeed
      
    • 离线条件下,可以下载源仓代码,再执行安装。
      pip install -e MindSpeed-v1.1.0
      
  3. 在模型根目录下执行以下命令,安装其他依赖。

    pip install -e .   # 安装本地OpenSoraPlan代码仓
    # 若使用PyTorch 2.4请另外使用requirements_2_4.txt
    pip install -r requirements_2_4.txt
    

安装昇腾环境

请参考昇腾社区中《Pytorch框架训练环境准备 》文档搭建昇腾环境,本仓已支持表2中软件版本。

表 2 昇腾软件版本支持表

软件类型 支持版本
FrameworkPTAdapter 在研版本
CANN 在研版本
昇腾NPU固件 在研版本
昇腾NPU驱动 在研版本

LatteT2V

训练数据集准备

用户需自行获取并解压mixkit2数据集,以及对应帧数的标注json,放置到OpenSoraPlan1.1/dataset目录下。 数据和标注可以从huggingface的LanguageBind/Open-Sora-Plan-v1.1.0数据集文件的all_mixkit和anno_jsons中获取。

数据结构如下:

OpenSoraPlan1.1
├── dataset
   ├── mixkit2
       ├── Airplane
       ├── Baby
       ├── ...
       └── video_mixkit_65f_54735.json

准备预训练模型

  1. 下载主要预训练模型。

    • 联网情况下,预训练模型会自动下载。

    • 无网络时,用户可访问huggingface官网自行下载主要权重,文件namespace如下:

      DeepFloyd/t5-v1_1-xxl               # t5模型
      LanguageBind/Open-Sora-Plan-v1.1.0  # 预训练权重(含3D VAE模型和LatteT2V模型)
      
  2. 将下载好的预训练模型放在本工程目录下,组织结构如下:

    OpenSoraPlan1.1
    ├── DeepFloyd
    │   ├── t5-v1_1-xxl
    │   │   ├── config.json
    │   │   ├── pytorch_model-00001-of-00002.bin
    │   │   ├── ...
    │   LanguageBind
    │   ├── Open-Sora-Plan-v1.1.0
    │   │   ├── 221x512x512
    │   │   ├── 65x512x512
    │   │   └── vae
    
  3. 模型还用到了torchvision提供的默认VGG16预训练权重。

    • 联网情况下,预训练模型会自动下载。
    • 无网络时,用户可自行下载权重,并放置到默认的~/.cache/torch/hub/checkpoints目录下。下载信息见torchvision.models.vgg.VGG16_Weights.IMAGENET1K_V1

快速开始

训练任务

本任务主要提供混精bf16 单机8卡混精bf16 双机32卡两种训练脚本,用于65帧分辨率为512x512的文生视频训练。

单机训练

  1. 进入解压后的源码包根目录。

    cd /${模型文件夹名称} 
    
  2. 运行预训练脚本。

    该模型支持单机8卡训练,执行以下命令执行训练。

    bash scripts/text_condition/train_videoae_65x512x512_16.sh # 8卡训练,混精bf16
    

    模型训练脚本参数说明如下。

        --config_file scripts/accelerate_configs/deepspeed_zero2_config.yaml \ // deepspeed配置文件
        opensora/train/train_t2v.py \                                          // 训练启动脚本
        --model LatteT2V-XL/122 \                                              // 训练模型
        --text_encoder_name DeepFloyd/t5-v1_1-xxl \                            // 文本编码器
        --cache_dir "./cache_dir" \                                            // 下载缓存目录
        --dataset t2v \                                                        // 数据集类型
        --ae CausalVAEModel_4x8x8 \                                            // 图片/视频预训练模型
        --ae_path "LanguageBind/Open-Sora-Plan-v1.1.0/vae" \                   // vae预训练文件路径
        --video_data "scripts/train_data/video_data.txt" \                     // 视频数据路径文件
        --use_img_from_vid \                                                   // 训练图片来自视频
        --sample_rate 1 \                                                      // 采样率
        --num_frames 65 \                                                      // 训练帧数
        --max_image_size 512 \                                                 // 图像/视频最大尺寸
        --gradient_checkpointing \                                             // 是否重计算
        --attention_mode math \                                                // attention的类型
        --train_batch_size=2 \                                                 // 训练的批大小
        --dataloader_num_workers 4 \                                           // 数据处理线程数
        --gradient_accumulation_steps=1 \                                      // 梯度累计步数
        --max_train_steps=1000000 \                                            // 最大训练步数
        --learning_rate=2e-05 \                                                // 学习率
        --lr_scheduler="constant" \                                            // 学习率调度策略
        --lr_warmup_steps=0 \                                                  // 学习率预热步数
        --mixed_precision="bf16" \                                             // 混精训练的数据类型
        --report_to="tensorboard" \                                            // 记录方式
        --checkpointing_steps=500 \                                            // 检查点步数
        --output_dir="65x512x512_10node_bs2_lr2e-5_16img" \                    // 输出的路径
        --allow_tf32 \                                                         // 使用tf32训练
        --use_deepspeed \                                                      // 使用deepspeed训练
        --model_max_length 300 \                                               // 文本最大长度
        --use_image_num 16 \                                                   // 训练使用图片的数量
        --enable_tiling \                                                      // 启用平铺
        --pretrained LanguageBind/Open-Sora-Plan-v1.1.0/65x512x512/diffusion_pytorch_model.safetensors // 预训练模型
    

多机训练

  1. 进入解压后的源码包根目录。

    cd /${模型文件夹名称} 
    
  2. 修改配置文件。

    修改scripts/accelerate_configs/multi_node.yaml,配置说明如下:

    compute_environment: LOCAL_MACHINE                            # 计算环境
    distributed_type: DEEPSPEED                                   # 分布式类型
    deepspeed_config:                                             # deepspeed配置
     deepspeed_multinode_launcher: standard                       # deepspeed多节点启动器
     deepspeed_config_file: scripts/accelerate_configs/zero2.json # deepspeed配置文件
    main_process_ip: 10.0.10.100                                  # 主进程IP
    main_process_port: 29502                                      # 主进程端口
    main_training_function: main                                  # 主训练函数
    num_machines: 2                                               # 机器数量
    num_processes: 32                                             # 进程数量
    
  3. 运行预训练脚本。

    分别在双机集群中执行以下脚本:

    bash scripts/text_condition/multi_node.sh # 32卡训练,混精bf16
    

    相对单机训练脚本,多机额外参数说明如下:

    --machine_rank 0 \ # 机器编号
    

训练结果

性能
芯片 卡数 单步迭代时间(s/step) batch_size AMP_Type Torch_Version
竞品A 8p 9.19 2 bf16 2.1
Atlas 900 A2 Box16 8p 9.40 2 bf16 2.1

在线推理任务

开始推理

  1. 进入解压后的源码包根目录。

    cd /${模型文件夹名称} 
    
  2. 运行推理脚本。

    该模型支持单卡文生视频在线推理。

    • 执行单卡推理。
    bash scripts/text_condition/sample_video_65x512x512.sh
    
    • 模型在线推理python脚本参数说明如下。
     python opensora/sample/sample_t2v.py \               # 启动推理脚本
     --model_path LanguageBind/Open-Sora-Plan-v1.1.0 \    # LatteT2V预训练权重父路径
     --version 65x512x512 \                               # LatteT2V预训练权重子路径
     --num_frames 65 \                                    # 视频帧数
     --height 512 \                                       # 视频高度像素数
     --width 512 \                                        # 视频宽度像素数
     --cache_dir "./cache_dir" \                          # 缓存路径
     --text_encoder_name DeepFloyd/t5-v1_1-xxl \          # 文本编码模型命名空间
     --text_prompt examples/prompt_list_0.txt \           # 文本提示文件路径
     --ae CausalVAEModel_4x8x8 \                          # 视频压缩模型
     --ae_path "LanguageBind/Open-Sora-Plan-v1.1.0/vae" \ # 视频压缩模型预训练权重路径
     --save_img_path "./sample_video_65x512x512" \        # 生成的视频文件路径
     --fps 24 \                                           # 生成的视频帧率
     --guidance_scale 7.5 \                               # 指导尺度
     --num_sampling_steps 150 \                           # 采样步数
     --enable_tiling                                      # 启用平铺
    

公网地址说明

代码涉及公网地址参考 public_address_statement.md

变更说明

变更

2024.06.20: LatteT2V bf16训练任务首次发布。

2024.07.24: LatteT2V fp16推理任务、多机训练任务首次发布。

FAQ

暂无。