文件最后提交记录最后更新时间
[Docs]splitting install_guide and rename quickstart Co-authored-by: z60119525<zhaotao68@h-partners.com> # message auto-generated for no-merge-commit merge: !2383 merge 26.0.0 into 26.0.0 [Docs]splitting install_guide and rename quickstart Created-by: kali00 Commit-by: z60119525 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-MM!23831 个月前
[Docs] Document corrections Co-authored-by: js1234567<jiangshuo9@h-partners.com> # message auto-generated for no-merge-commit merge: !2108 merge master into master [Docs] Document corrections Created-by: js1234567 Commit-by: js1234567 Merged-by: ascend-robot Description: ## Motivation Document corrections: 1. 添加2.3.0配套信息 2. 中英文标点问题 3. 链接版本更新 4. CANN8.5.0版本配置环境变量刷新, 涉及环境变量设置需全面排查修改 ## Modification Readme.md ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!21083 个月前
[Modify] Move VeRL plugin and example Bash scripts to new location Co-authored-by: shaojiemike<943648187@qq.com> # message auto-generated for no-merge-commit merge: !1794 merge verlFoldChange into master [Modify] Move VeRL plugin and example Bash scripts to new location Created-by: shaojiemike Commit-by: shaojiemike Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Move VeRL plugin and example Bash scripts to new location ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!17946 个月前
[Docs] Update qwen2.5vl performance script Co-authored-by: js1234567<jiangshuo9@h-partners.com> # message auto-generated for no-merge-commit merge: !2020 merge master into master [Docs] Update qwen2.5vl performance script Created-by: js1234567 Commit-by: js1234567 Merged-by: ascend-robot Description: ## Motivation Update qwen2.5vl performance script ## Modification Modify the readme Add shell for performance ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!20204 个月前
[Modify] Move VeRL plugin and example Bash scripts to new location Co-authored-by: shaojiemike<943648187@qq.com> # message auto-generated for no-merge-commit merge: !1794 merge verlFoldChange into master [Modify] Move VeRL plugin and example Bash scripts to new location Created-by: shaojiemike Commit-by: shaojiemike Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Move VeRL plugin and example Bash scripts to new location ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/MindSpeed-MM!17946 个月前
README.md

Qwen2_5_VL GRPO 使用指南

目录

简介

以 MindSpeed MM 仓库复现 Group Relative Policy Optimization (GRPO) 后训练方法为例来帮助用户快速入门,前期需要完成代码仓、环境、数据集以及权重等准备工作,再按照说明中的启动方式启动训练,以下为具体的操作说明。

参考实现

url=https://github.com/volcengine/verl
commit_id=97b65c63c729c61ca607315cf7084012aabc6bba

变更记录

2025.09.03: 首次支持Qwen2.5-VL 7B/32B GRPO训练

环境安装

【模型开发时推荐使用配套的环境版本】

请参考安装指南

GRPO场景下,环境依赖如下:

1. 环境依赖

PyTorch版本 torch_npu版本 Python版本
2.5.1 2.5.1 Python3.11

2. 环境搭建

# python3.11
conda create -n test python=3.11
conda activate test

# for torch-npu dev version or x86 machine [Optional]
# pip config set global.extra-index-url "https://download.pytorch.org/whl/cpu/ https://mirrors.huaweicloud.com/ascend/repos/pypi"
# pip install 后缀需加上 --trusted-host download.pytorch.org --trusted-host mirrors.huaweicloud.com
# 安装torch和torch_npu
pip install torch-2.5.1-cp311-cp311-*.whl
pip install torch_npu-2.5.1*.manylinux2014_*.whl


source /usr/local/Ascend/cann/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh

# vllm
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout v0.9.1
VLLM_TARGET_DEVICE=empty pip install -v -e .
cd ..

# vllm-ascend
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
git checkout 2961f2f
pip install -v -e .
cd ..

# verl
git clone https://github.com/volcengine/verl.git
cd verl
git checkout 97b65c63c729c61ca607315cf7084012aabc6bba
pip install -r requirements-npu.txt
pip install -v -e .
cd ..

# 安装三方库
pip install transformers==4.52.4 mathruler==0.1.0 decorator qwen-vl-utils==0.0.11 viztracer cloudpickle==2.1.0 setuptools==80.9.0

# 因安装环境可能导致覆盖,需重新安装torch_npu
pip install torch_npu-2.5.1*.manylinux2014_*.whl

3. 安装插件

# 请确保 vllm 已正确安装并且之后不会做覆盖
git clone --branch 26.0.0 https://gitcode.com/Ascend/MindSpeed-MM.git
cd MindSpeed-MM/verl_plugin
export MODEL_SELECT="Qwen2_5vl"
# path_to_verl替换为verl源码路径 例如:/home/code/verl
export VERL_PATH=path_to_verl
pip install -v -e .
cp -r ../examples/verl_examples/qwen2.5vl/* ../../verl/examples/grpo_trainer/
cd ../../verl/

权重下载

从Hugging Face库下载对应的模型权重:

数据集准备及处理

多模态模型使用geo3k数据集,在模型根目录下执行命令,下载并处理数据集,--local_dir为可选参数,不设置默认下载位置为~/data/geo3k

下载数据集的过程中,保证网络正常。若下载原始数据比较慢,可以手动下载原始数据到本地,通过修改代码进行数据处理:

vim ./examples/data_preprocess/geo3k.py
## 34行附近修改data_source为原始数据路径
dataset = dataset.load_dataset(data_source) # 将data_source修改为数据集下载路径
# 在线下载原始数据并预处理
python ./examples/data_preprocess/geo3k.py --local_dir=./data/geo3k

训练

1. 准备工作

配置脚本前需要完成前置准备工作,包括:环境安装权重下载数据集准备及处理,详情可查看对应章节。

2. 启动训练

以 Qwen2.5VL 7B 模型为例,在启动训练之前,需要修改启动脚本的配置:

  1. 根据使用机器的情况,修改NNODESNPUS_PER_NODE配置, 例如单机 A2 可设置NNODES为 1 、NPUS_PER_NODE为8;

  2. 如果是单机,需要保证MASTER_ADDRCURRENT_IP一致,如果为多机,需要保证各个机器的MASTER_ADDR一致,CURRENT_IP为各个节点的 IP (需要注意的是MASTER_ADDRCURRENT_IP不能设置为localhost);

    vim examples/grpo_trainer/ray_start.sh
    

    修改以下配置:

    source /usr/local/Ascend/cann/set_env.sh # 修改cann路径
    source /usr/local/Ascend/nnal/atb/set_env.sh # 修改nnal路径
    
    NNODES=1 # 节点数
    NPUS_PER_NODE=8 # NPU卡数
    MASTER_ADDR="localhost" # 节点IP
    SOCKET_IFNAME="Your SOCKET IFNAME" # 节点网卡名称
    
  3. 若多机运行,需修改verl中的main_ppo.py文件:

    vim verl/trainer/main_ppo.py
    

    在64行附近修改以下代码,添加自动获取节点信息:

    def run_ppo(config) -> None:
        ...
        print(f"ray init kwargs: {ray_init_kwargs}") # 原代码
        # ray.init(**OmegaConf.to_container(ray_init_kwargs)) # 原代码
        ray.init(**OmegaConf.to_container(ray_init_kwargs), address="auto") # 新代码
    
  4. 实际运行场景与默认配置脚本不一致时,需要根据实际场景调整micro_batch_sizedispatch_size等相关配置参数。

  5. 以 Qwen2.5VL 32B 模型为例,若增加机器规模的情况下,可通过修改启动脚本的配置:

    vim examples/grpo_trainer/train_qwen2_5_vl_32b_grpo_performance.sh
    

    修改以下配置,设置多机规模:

    nnodes=2 # 节点数
    n_npus_per_node=16 # NPU卡数
    

    规模增加,可修改以下配置,将推理部分减少FSDP切分参数,减少冗余通讯:

    train_batch_size=256 # 增大GBS,如修改为512
    log_prob_micro_batch_size_per_gpu=4 # 增加推理阶段对通讯的掩盖,如修改为16
    parameters=0  # 减少wrap参数,如修改为1e7
    
  6. 以 Qwen2.5VL 32B 模型为例,若想要增大步数,可修改以下配置参数:

    vim examples/grpo_trainer/train_qwen2_5_vl_32b_grpo_performance.sh
    

    增大total_epochs配置参数为想要训练的总步数(以300个epochs为例),并将total_training_steps设置为null:

    trainer.total_epochs=300 \
    trainer.total_training_steps='null' \
    
  7. 如需使用确定性计算,在安装插件步骤中需添加export DETERMINISTIC=True

    # 添加:
    export DETERMINISTIC=True 
    # 添加后在进行如下操作:
    export MODEL_SELECT="Qwen2_5vl"
    # path_to_verl替换为verl源码路径 例如:/home/code/verl
    export VERL_PATH=path_to_verl
    pip install -v -e .
    

    后续可通过在verl/workers/fsdp_workers.py文件中114行附近注释掉seed_all()来关闭确定性计算

  8. 启动ray, 若多机运行,需主节点到副节点依次运行此脚本:

    bash examples/grpo_trainer/ray_start.sh
    
  9. 启动训练脚本:

    • model_path为模型权重路径,data_path为数据集路径
    • 若多机运行,仅需主节点需运行此脚本
    bash examples/grpo_trainer/train_qwen2_5_vl_7b_grpo_full.sh --data_path=xxx --model_path=xxx
    

注意:所有节点的代码、权重、数据等路径的层级要保持一致,且启动ray的时候都位于verl目录下

3. 日志打点指标说明

时间相关指标说明

指标 说明
timing_s/gen 一次迭代中generation耗时
timing_s/reward 一次迭代中reward耗时
timing_s/old_log_prob 一次迭代中actor model计算log prob耗时
timing_s/ref 一次迭代中reference model计算耗时
timing_s/adv 计算advantages耗时
timing_s/reshard 一次迭代中reshard耗时
timing_s/update_actor 一次迭代中actor model进行update耗时
timing_s/step 一次迭代总时间
timing_s/generate_sequence 一次迭代中generate_sequence耗时

其他指标

指标 说明
actor/entropy 策略熵,表示策略的随机性或探索能力
actor/kl_loss kl散度,衡量当前策略与参考策略(如旧策略或参考模型)之间的偏离程度
actor/pg_loss pg_loss,基于优势函数的策略梯度目标函数值,表示当前策略对提升奖励的学习能力。
actor/pg_clipfrac GRPO中裁剪机制生效的比例,反映了策略更新幅度的稳定性
actor/ppo_kl PPO算法的实际 KL 散度
actor/grad_norm 梯度范数,表示当前反向传播中参数梯度的整体幅度
actor/lr 学习率,优化器当前使用的学习率
critic/score/mean 开启奖励模型时的reward均值
critic/score/max 奖励模型及规则奖励对同一个样本的reward最大值
critic/score/min 奖励模型及规则奖励对同一个样本的reward最小值
critic/rewards/mean 规则奖励的reward均值;奖励模型对样本的reward经过归一化后的均值
critic/rewards/max 规则奖励的reward最大值;奖励模型对样本的reward经过归一化后的最大值
critic/rewards/min 规则奖励的reward最小值;奖励模型对样本的reward经过归一化后的最小值
response_length/mean 平均生成长度,模型生成回复(response)的平均 token 数
response_length/min 最短生成长度,当前 batch 中生成最短的 response 长度
response_length/max 最长生成长度,当前 batch 中生成最长的 response 长度
prompt_length/mean 平均输入长度,输入 prompt 的平均长度
prompt_length/max 最长输入长度,当前 batch 中最长的 prompt长度
prompt_length/min 最短输入长度,当前 batch 中最长的 prompt长度
perf/total_num_tokens 总tokens数
perf/time_per_step 每步耗时
perf/throughput 吞吐指标

注意事项

  1. 容器内启动时可能会遇到不存在ip命令的错误,可使用如下命令进行安装:

    sudo apt-get install iproute2
    
  2. 如果安装vllm ascend失败,提示fatal error: 'cstdint' file not found,可能是gcc版本问题,可参考此处解决。更多vllm ascend问题可以向社区求助。

  3. 可以使用Ray Debugger对代码进行调试,在安装完插件后,需要在环境中安装依赖:

    pip install "ray[default]" debugpy