dd6fe347创建于 4月9日历史提交

文件	最后提交记录	最后更新时间
.github	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
docker	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
docs	!7301 add actor_update_micro part of verl GRPO profiling Merge pull request !7301 from 刘彤彤/bug-fix	11 个月前
examples	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
patches	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
recipe	fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7517	1 个月前
scripts	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
test	!7325 [VERL] Add fused operators to improve model performance and update the training script Merge pull request !7325 from 刘彤彤/master	10 个月前
tests	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
transformers_need	!7320 optimize performance for qwen2_5_vl grpo Merge pull request !7320 from 孙毅/master	10 个月前
verl	fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7517	1 个月前
vllm_ascend_need	!7236 change vllm-ascend version to 0.7.3 Merge pull request !7236 from 孙毅/verl_perf	1 年前
.gitignore	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
.readthedocs.yaml	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
.style.yapf	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
LICENSE	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
Notice.txt	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
README.md	fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7517	1 个月前
README_ORI.md	fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7517	1 个月前
public_address_statement.md	!7376 optimize public_address_statement.md Merge pull request !7376 from 王凯宇/master	8 个月前
pyproject.toml	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
requirements-npu.txt	!7357 [PyTorch][built-in] fixed verison of pybind11 for verl Merge pull request !7357 from 孙毅/master	9 个月前
requirements.txt	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
requirements_sglang.txt	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前
setup.py	!7199 add verl original code and public addree statement Merge pull request !7199 from 孙毅/verl-0423	1 年前

目前本仓已停止维护，请转跳至

VeRL for Pytorch

概述
准备训练环境
开始训练
训练结果展示
版本说明

概述

简述

verl‌是一个集SFT（监督学习）与RL（强化学习）于一体的灵活大模型后训练框架。它特别适用于大型语言模型（LLM）的后训练阶段，旨在通过调整预训练模型的参数以适应新的任务或数据集。

参考实现：

url=https://github.com/volcengine/verl
commit_id=5b542d273cc792971eb66aca07494523be61c58c

适配昇腾 AI 处理器的实现：

url=https://gitcode.com/ascend/ModelZoo-PyTorch.git
code_path=PyTorch/built-in/rl/

准备训练环境

准备环境

推荐参考配套资源文档使用最新的配套版本。

表 1 版本配套表

软件	版本	安装指南
Driver	AscendHDK 25.1.RC1	《驱动固件安装指南》
Firmware	AscendHDK 25.1.RC1	《驱动固件安装指南》
CANN	CANN 8.1.RC1	《CANN 软件安装指南》
PyTorch	2.5.1	《Ascend Extension for PyTorch 配置与安装》
torch_npu	2.5.1	《Ascend Extension for PyTorch 配置与安装》

安装vLLM和vLLM Ascend

# 安装目录不能放在模型根目录下
git clone -b v0.7.3 --depth 1 https://github.com/vllm-project/vllm.git
cd vllm
pip install -r requirements-build.txt
VLLM_TARGET_DEVICE=empty pip install -e . --extra-index https://download.pytorch.org/whl/cpu/
cd ..

source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh

# 对于VL模型，编译并安装vllm-ascend v0.7.3
git clone -b v0.7.3 --depth 1 https://github.com/vllm-project/vllm-ascend.git
cp -f VeRL_for_PyTorch根目录/vllm_ascend_need/qwen2_5_vl.py vllm-ascend/vllm_ascend/models/
cp -f VeRL_for_PyTorch根目录/vllm_ascend_need/rotary_embedding.py vllm-ascend/vllm_ascend/ops/
cd vllm-ascend
export COMPILE_CUSTOM_KERNELS=1
python setup.py install
cd ..

# 对于LLM模型，编译并安装vllm-ascend 特定commit id代码
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
git checkout edeadde387451ca982fe3717555c1841ee195718
sed -i '1s/^/set(CMAKE_CXX_STANDARD 17)\n/' CMakeLists.txt
export COMPILE_CUSTOM_KERNELS=1
python setup.py install
cd ..

克隆transformers仓并切换到对应的commit id

git clone --depth 1 https://github.com/huggingface/transformers.git
cd transformers
git fetch --depth 1 origin aa17cfb4d532239336d2f89e06f01d48387292a3
git checkout aa17cfb4d532239336d2f89e06f01d48387292a3
cp -f VeRL_for_PyTorch根目录/transformers_need/npu_flash_attention.py src/transformers/integrations/
pip install -e .
cd ..

对于VL模型，需要安装torchvision

pip install torchvision==0.20.1+cpu --index-url https://download.pytorch.org/whl/cpu

环境准备指导。

请参考《Pytorch框架训练环境准备》搭建torch环境。
安装依赖。

在模型根目录下执行命令，安装模型对应PyTorch版本需要的依赖。
```
pip install -r requirements-npu.txt 
pip install -e .
```

准备数据集

VL模型使用geo3k数据集，在模型根目录下执行命令，下载并处理数据集，--local_dir为可选参数，不设置默认下载位置为~/data/geo3k。

python examples/data_preprocess/geo3k.py --local_dir=xxx

LLM模型使用gsm8k数据集，在模型根目录下执行命令，下载并处理数据集，--local_dir为可选参数，不设置默认下载位置为~/data/gsm8k。

python examples/data_preprocess/gsm8k.py --local_dir=xxx

获取预训练模型

用户自行下载Qwen2.5-VL-7B-Instruct、Qwen2.5-VL-3B-Instruct、Qwen2.5-VL-32B-Instruct、Qwen2.5-7B-Instruct和Qwen2.5-32B-Instruct模型。

开始训练

训练模型

进入解压后的源码包根目录。
```
cd /${模型文件夹名称} 
```
双机运行环境配置（单机环境请忽略）。
1. 主从节点保证模型和数据集路径完全相同。
2. 主节点执行以下命令启动ray集群：
```
ray start --head
```
3. 从节点执行以下命令加入ray集群：
```
ray start --address='主节点ip:6379'
```
4. 从节点执行以下命令确认双机已互联：
```
ray status
```

运行训练脚本。

使用GRPO算法进行训练。

Qwen2.5-VL-3B-Instruct模型支持单机8卡训练。

单机8卡训练

bash test/train_qwen2_5_vl_3b_GRPO_full_8p.sh --data_path=xxx --model_path=xxx  # 8卡训练

单机8卡性能

bash test/train_qwen2_5_vl_3b_GRPO_performance_8p.sh --data_path=xxx --model_path=xxx   # 8卡性能

Qwen2.5-VL-7B-Instruct模型支持单机16卡训练。

单机16卡训练

bash test/train_qwen2_5_vl_7b_GRPO_full_16p.sh --data_path=xxx --model_path=xxx   # 16卡训练

单机16卡性能

bash test/train_qwen2_5_vl_7b_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx   # 16卡性能

Qwen2.5-VL-32B-Instruct模型支持双机32卡训练。

双机32卡训练

# 主节点执行
bash test/train_qwen2_5_vl_32b_GRPO_full_32p.sh --data_path=xxx --model_path=xxx   # 32卡训练

双机32卡性能

# 主节点执行
bash test/train_qwen2_5_vl_32b_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx   # 32卡性能

Qwen2.5-7B-Instruct模型支持单机16卡训练。

单机16卡训练

bash test/train_qwen2_5_7b_instruct_GRPO_full_16p.sh --data_path=xxx --model_path=xxx   # 16卡训练

单机16卡性能

bash test/train_qwen2_5_7b_instruct_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx   # 16卡性能

Qwen2.5-32B-Instruct模型支持双机32卡训练。

双机32卡训练

# 主节点执行
bash test/train_qwen2_5_32b_instruct_GRPO_full_32p.sh --data_path=xxx --model_path=xxx   # 32卡训练

双机32卡性能

# 主节点执行
bash test/train_qwen2_5_32b_instruct_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx   # 32卡性能

训练完成后，训练日志保存在test/output路径下，并输出模型训练精度和性能信息。

使用DAPO算法进行训练。

Qwen2.5-7B-Instruct模型支持单机16卡训练。

单机16卡训练

bash test/train_qwen2_5_7b_instruct_DAPO_full_16p.sh --data_path=xxx --model_path=xxx   # 16卡训练

单机16卡性能

bash test/train_qwen2_5_7b_instruct_DAPO_performance_16p.sh --data_path=xxx --model_path=xxx   # 16卡性能

Qwen2.5-32B-Instruct模型支持双机32卡训练。

双机32卡训练

# 主节点执行
bash test/train_qwen2_5_32b_instruct_DAPO_full_32p.sh --data_path=xxx --model_path=xxx   # 32卡训练

双机32卡性能

# 主节点执行
bash test/train_qwen2_5_32b_instruct_DAPO_performance_32p.sh --data_path=xxx --model_path=xxx   # 32卡性能

训练结果展示

表 2 训练结果展示表

Model	Algorithm	Hardware	Throughput	Max Training TimeSteps
Qwen2.5-VL-3B-Instruct	GRPO	8p-竞品A	739.453	60
Qwen2.5-VL-3B-Instruct	GRPO	8P Atlas 200T A2 Box16	523.599	60
Qwen2.5-VL-7B-Instruct	GRPO	8p-竞品A	568.452	60
Qwen2.5-VL-7B-Instruct	GRPO	16P Atlas 200T A2 Box16	326.37	60
Qwen2.5-VL-32B-Instruct	GRPO	16p-竞品A	109.497	60
Qwen2.5-VL-32B-Instruct	GRPO	32P Atlas 200T A2 Box16	70.0735	60
Qwen2.5-7B-Instruct	GRPO	8p-竞品A	323.872	35
Qwen2.5-7B-Instruct	GRPO	16P Atlas 200T A2 Box16	190.617	35
Qwen2.5-7B-Instruct	DAPO	16P Atlas 200T A2 Box16	198.678	84
Qwen2.5-32B-Instruct	GRPO	16p-竞品A	79.022	105
Qwen2.5-32B-Instruct	GRPO	32P Atlas 200T A2 Box16	54.162	105
Qwen2.5-32B-Instruct	DAPO	32P Atlas 200T A2 Box16	47.725	32

公网地址说明

代码涉及公网地址参考 public_address_statement.md

版本说明

变更

2025.05.12：首次发布。

FAQ

如果在训练过程中遇到RuntimeError: Gloo connectFullMesh failed错误，请按照以下步骤操作：
- 在主从节点分别执行以下命令获取节点ip对应的网口名称：
```
ifconfig
```
- 在主从节点分别设置以下环境变量：
```
export GLOO_SOCKET_IFNAME=网口名称
export HCCL_SOCKET_IFNAME=网口名称 
```
如果遇到ImportError: cannot import name "fused_topk" from "fused_moe_module" (unknown location)错误，请使用以下命令卸载triton：
```
pip uninstall triton
```
如果在docker容器中训练时发现性能远低于基线，可能是由于Ray在容器内的调度性能低于物理机导致的，可以在物理机执行以下命令拉起Ray，然后在容器内训练：
```
ray start --head
```