文件最后提交记录最后更新时间
add roll readme and example yaml Co-authored-by: UsernameFull<tohowtodoit@gmail.com> # message auto-generated for no-merge-commit merge: !7510 merge master into master add roll readme and example yaml Created-by: gcw_CXpJWhGA Commit-by: UsernameFull Merged-by: ascend-robot Description: ## Motivation add roll readme and example yaml ## Modification add ROLL_for_PyTorch dir and files ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75101 个月前
fix roll example yaml Co-authored-by: UsernameFull<tohowtodoit@gmail.com> # message auto-generated for no-merge-commit merge: !7534 merge master into master fix roll example yaml Created-by: gcw_CXpJWhGA Commit-by: UsernameFull Merged-by: ascend-robot Description: ## Motivation fix roll example yaml ## Modification fix roll example yaml to reduce NPU memory useage ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75341 个月前
add roll A5 example yaml Co-authored-by: UsernameFull<tohowtodoit@gmail.com> # message auto-generated for no-merge-commit merge: !7562 merge master into master add roll A5 example yaml Created-by: gcw_CXpJWhGA Commit-by: UsernameFull Merged-by: ascend-robot Description: ## Motivation add roll A5 example yaml ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!756212 天前
README.md

ROLL x 昇腾

最后更新时间:2026-04-03。

我们已在 ROLL 中添加了对华为昇腾设备的支持。

硬件支持

Atlas 900 A3 PODc

安装

基础环境配置

软件 版本
Python 3.11
CANN 8.5.1

创建 Conda 环境

在 Miniconda 中使用以下命令创建新的 conda 环境:

conda create --name roll python=3.11
conda activate roll

安装 torch & torch_npu

要在 ROLL 中使用 torch 和 torch_npu,请使用以下命令进行安装:

# 使用仅 CPU 版本的 torch
pip install torch==2.9.0 torchvision==0.24.0 torchaudio==2.9.0 --index-url https://download.pytorch.org/whl/cpu

# 安装 torch_npu 2.9.0
pip install torch_npu==2.9.0

安装 vllm & vllm-ascend

要在 ROLL 中使用 vllm,请按以下步骤编译安装 vllm 和 vllm-ascend:

# vllm
git clone -b v0.13.0 --depth 1 https://github.com/vllm-project/vllm.git
cd vllm
pip install -r requirements/build.txt

VLLM_TARGET_DEVICE=empty pip install -v -e .
cd ..

# vllm-ascend
git clone -b v0.13.0 --depth 1 https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend

pip install -e .
cd ..

或者,您也可以从预构建的 wheel 包安装 vllmvllm-ascend

# 安装 vllm-project/vllm。最新支持的版本是 v0.13.0
pip install vllm==0.13.0

# 从 pypi 安装 vllm-project/vllm-ascend
pip install vllm-ascend==0.13.0

安装 ROLL

git clone https://github.com/alibaba/ROLL.git
cd ROLL
pip install -r requirements_common.txt
pip install -e .
pip install deepspeed==0.16.4
pip install tensorboard
cd ..

其他第三方库

软件 说明
transformers >= v4.57.6
flash_attn 不支持
transformer-engine[pytorch] 不支持
  1. transformers v4.57.6 支持启用 --flash_attention_2
  2. 目前不支持 flash_attn 加速。
  3. 目前不支持 transformer-engine[pytorch]
pip install transformers==4.57.6

快速开始:单节点部署

在正式使用之前,我们建议先测试单节点流程以验证您的环境和安装是否正确。 由于目前还不支持 Megatron-LM 训练,请先在相关文件中将 strategy_args 更改为使用 deepspeed 选项。

注意: 目前 NPU 上不支持 colocated 模式。您需要修改 device_mapping 以确保训练和推理在不同的卡上进行。

使用配置文件运行 rlvr 流程,将qwen3_8b_rlvr_deepspeed.yaml复制到xx文件架下,运行:

# 确保您在 ROLL 项目的根目录下

python examples/start_agentic_pipeline.py \
        --config_path qwen2.5-7B-rlvr-offpolicy \
        --config_name qwen3_8b_rlvr_deepspeed
  • --config_path – 包含 YAML 配置文件的目录。
  • --config_name – 文件名(不含 .yaml 扩展名)。