文件最后提交记录最后更新时间
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
!7301 add actor_update_micro part of verl GRPO profiling Merge pull request !7301 from 刘彤彤/bug-fix 11 个月前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
!7325 [VERL] Add fused operators to improve model performance and update the training script Merge pull request !7325 from 刘彤彤/master 10 个月前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
!7320 optimize performance for qwen2_5_vl grpo Merge pull request !7320 from 孙毅/master 10 个月前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
!7236 change vllm-ascend version to 0.7.3 Merge pull request !7236 from 孙毅/verl_perf 1 年前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
!7376 optimize public_address_statement.md Merge pull request !7376 from 王凯宇/master 8 个月前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
!7357 [PyTorch][built-in] fixed verison of pybind11 for verl Merge pull request !7357 from 孙毅/master 9 个月前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423 1 年前
README.md

目前本仓已停止维护,请转跳至

VeRL for Pytorch

概述

简述

verl‌是一个集SFT(监督学习)与RL(强化学习)于一体的灵活大模型后训练框架。它特别适用于大型语言模型(LLM)的后训练阶段,旨在通过调整预训练模型的参数以适应新的任务或数据集。

  • 参考实现:

    url=https://github.com/volcengine/verl
    commit_id=5b542d273cc792971eb66aca07494523be61c58c
    
  • 适配昇腾 AI 处理器的实现:

    url=https://gitcode.com/ascend/ModelZoo-PyTorch.git
    code_path=PyTorch/built-in/rl/
    

准备训练环境

准备环境

  • 推荐参考配套资源文档使用最新的配套版本。

    表 1 版本配套表

    软件 版本 安装指南
    Driver AscendHDK 25.1.RC1 驱动固件安装指南
    Firmware AscendHDK 25.1.RC1
    CANN CANN 8.1.RC1 CANN 软件安装指南
    PyTorch 2.5.1 Ascend Extension for PyTorch 配置与安装
    torch_npu 2.5.1
  • 安装vLLM和vLLM Ascend

    # 安装目录不能放在模型根目录下
    git clone -b v0.7.3 --depth 1 https://github.com/vllm-project/vllm.git
    cd vllm
    pip install -r requirements-build.txt
    VLLM_TARGET_DEVICE=empty pip install -e . --extra-index https://download.pytorch.org/whl/cpu/
    cd ..
    
    source /usr/local/Ascend/ascend-toolkit/set_env.sh
    source /usr/local/Ascend/nnal/atb/set_env.sh
    
    # 对于VL模型,编译并安装vllm-ascend v0.7.3
    git clone -b v0.7.3 --depth 1 https://github.com/vllm-project/vllm-ascend.git
    cp -f VeRL_for_PyTorch根目录/vllm_ascend_need/qwen2_5_vl.py vllm-ascend/vllm_ascend/models/
    cp -f VeRL_for_PyTorch根目录/vllm_ascend_need/rotary_embedding.py vllm-ascend/vllm_ascend/ops/
    cd vllm-ascend
    export COMPILE_CUSTOM_KERNELS=1
    python setup.py install
    cd ..
    
    # 对于LLM模型,编译并安装vllm-ascend 特定commit id代码
    git clone https://github.com/vllm-project/vllm-ascend.git
    cd vllm-ascend
    git checkout edeadde387451ca982fe3717555c1841ee195718
    sed -i '1s/^/set(CMAKE_CXX_STANDARD 17)\n/' CMakeLists.txt
    export COMPILE_CUSTOM_KERNELS=1
    python setup.py install
    cd ..
    
  • 克隆transformers仓并切换到对应的commit id

    git clone --depth 1 https://github.com/huggingface/transformers.git
    cd transformers
    git fetch --depth 1 origin aa17cfb4d532239336d2f89e06f01d48387292a3
    git checkout aa17cfb4d532239336d2f89e06f01d48387292a3
    cp -f VeRL_for_PyTorch根目录/transformers_need/npu_flash_attention.py src/transformers/integrations/
    pip install -e .
    cd ..
    
  • 对于VL模型,需要安装torchvision

    pip install torchvision==0.20.1+cpu --index-url https://download.pytorch.org/whl/cpu
    
  • 环境准备指导。

    请参考《Pytorch框架训练环境准备》搭建torch环境。

  • 安装依赖。

    在模型根目录下执行命令,安装模型对应PyTorch版本需要的依赖。

    pip install -r requirements-npu.txt 
    pip install -e .
    

准备数据集

VL模型使用geo3k数据集,在模型根目录下执行命令,下载并处理数据集,--local_dir为可选参数,不设置默认下载位置为~/data/geo3k

python examples/data_preprocess/geo3k.py --local_dir=xxx

LLM模型使用gsm8k数据集,在模型根目录下执行命令,下载并处理数据集,--local_dir为可选参数,不设置默认下载位置为~/data/gsm8k

python examples/data_preprocess/gsm8k.py --local_dir=xxx

获取预训练模型

用户自行下载Qwen2.5-VL-7B-InstructQwen2.5-VL-3B-InstructQwen2.5-VL-32B-InstructQwen2.5-7B-InstructQwen2.5-32B-Instruct模型。

开始训练

训练模型

  1. 进入解压后的源码包根目录。

    cd /${模型文件夹名称} 
    
  2. 双机运行环境配置(单机环境请忽略)。

    1. 主从节点保证模型和数据集路径完全相同。

    2. 主节点执行以下命令启动ray集群:

      ray start --head
      
    3. 从节点执行以下命令加入ray集群:

      ray start --address='主节点ip:6379'
      
    4. 从节点执行以下命令确认双机已互联:

      ray status
      
  3. 运行训练脚本。

    • 使用GRPO算法进行训练。

      Qwen2.5-VL-3B-Instruct模型支持单机8卡训练。

      • 单机8卡训练

        bash test/train_qwen2_5_vl_3b_GRPO_full_8p.sh --data_path=xxx --model_path=xxx  # 8卡训练
        
      • 单机8卡性能

        bash test/train_qwen2_5_vl_3b_GRPO_performance_8p.sh --data_path=xxx --model_path=xxx   # 8卡性能
        

      Qwen2.5-VL-7B-Instruct模型支持单机16卡训练。

      • 单机16卡训练

        bash test/train_qwen2_5_vl_7b_GRPO_full_16p.sh --data_path=xxx --model_path=xxx   # 16卡训练
        
      • 单机16卡性能

        bash test/train_qwen2_5_vl_7b_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx   # 16卡性能
        

      Qwen2.5-VL-32B-Instruct模型支持双机32卡训练。

      • 双机32卡训练

        # 主节点执行
        bash test/train_qwen2_5_vl_32b_GRPO_full_32p.sh --data_path=xxx --model_path=xxx   # 32卡训练
        
      • 双机32卡性能

        # 主节点执行
        bash test/train_qwen2_5_vl_32b_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx   # 32卡性能
        

      Qwen2.5-7B-Instruct模型支持单机16卡训练。

      • 单机16卡训练

        bash test/train_qwen2_5_7b_instruct_GRPO_full_16p.sh --data_path=xxx --model_path=xxx   # 16卡训练
        
      • 单机16卡性能

        bash test/train_qwen2_5_7b_instruct_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx   # 16卡性能
        

      Qwen2.5-32B-Instruct模型支持双机32卡训练。

      • 双机32卡训练

        # 主节点执行
        bash test/train_qwen2_5_32b_instruct_GRPO_full_32p.sh --data_path=xxx --model_path=xxx   # 32卡训练
        
      • 双机32卡性能

        # 主节点执行
        bash test/train_qwen2_5_32b_instruct_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx   # 32卡性能
        

      训练完成后,训练日志保存在test/output路径下,并输出模型训练精度和性能信息。

    • 使用DAPO算法进行训练。

      Qwen2.5-7B-Instruct模型支持单机16卡训练。

      • 单机16卡训练

        bash test/train_qwen2_5_7b_instruct_DAPO_full_16p.sh --data_path=xxx --model_path=xxx   # 16卡训练
        
      • 单机16卡性能

        bash test/train_qwen2_5_7b_instruct_DAPO_performance_16p.sh --data_path=xxx --model_path=xxx   # 16卡性能
        

      Qwen2.5-32B-Instruct模型支持双机32卡训练。

      • 双机32卡训练

        # 主节点执行
        bash test/train_qwen2_5_32b_instruct_DAPO_full_32p.sh --data_path=xxx --model_path=xxx   # 32卡训练
        
      • 双机32卡性能

        # 主节点执行
        bash test/train_qwen2_5_32b_instruct_DAPO_performance_32p.sh --data_path=xxx --model_path=xxx   # 32卡性能
        

训练结果展示

表 2 训练结果展示表

Model Algorithm Hardware Throughput Max Training TimeSteps
Qwen2.5-VL-3B-Instruct GRPO 8p-竞品A 739.453 60
Qwen2.5-VL-3B-Instruct GRPO 8P Atlas 200T A2 Box16 523.599 60
Qwen2.5-VL-7B-Instruct GRPO 8p-竞品A 568.452 60
Qwen2.5-VL-7B-Instruct GRPO 16P Atlas 200T A2 Box16 326.37 60
Qwen2.5-VL-32B-Instruct GRPO 16p-竞品A 109.497 60
Qwen2.5-VL-32B-Instruct GRPO 32P Atlas 200T A2 Box16 70.0735 60
Qwen2.5-7B-Instruct GRPO 8p-竞品A 323.872 35
Qwen2.5-7B-Instruct GRPO 16P Atlas 200T A2 Box16 190.617 35
Qwen2.5-7B-Instruct DAPO 16P Atlas 200T A2 Box16 198.678 84
Qwen2.5-32B-Instruct GRPO 16p-竞品A 79.022 105
Qwen2.5-32B-Instruct GRPO 32P Atlas 200T A2 Box16 54.162 105
Qwen2.5-32B-Instruct DAPO 32P Atlas 200T A2 Box16 47.725 32

公网地址说明

代码涉及公网地址参考 public_address_statement.md

版本说明

变更

2025.05.12:首次发布。

FAQ

  • 如果在训练过程中遇到RuntimeError: Gloo connectFullMesh failed错误,请按照以下步骤操作:

    • 在主从节点分别执行以下命令获取节点ip对应的网口名称:
    ifconfig
    
    • 在主从节点分别设置以下环境变量:
    export GLOO_SOCKET_IFNAME=网口名称
    export HCCL_SOCKET_IFNAME=网口名称 
    
  • 如果遇到ImportError: cannot import name "fused_topk" from "fused_moe_module" (unknown location)错误,请使用以下命令卸载triton

    pip uninstall triton
    
  • 如果在docker容器中训练时发现性能远低于基线,可能是由于Ray在容器内的调度性能低于物理机导致的,可以在物理机执行以下命令拉起Ray,然后在容器内训练:

    ray start --head