文件最后提交记录最后更新时间
!5381 [自研][Pytorch][PPO for Pytorch] 初次提交 * [自研][Pytorch][PPO for Pytorch] 初次提交 2 年前
!6943 add description for env variables Merge pull request !6943 from WeiChunyu/master 1 年前
!5381 [自研][Pytorch][PPO for Pytorch] 初次提交 * [自研][Pytorch][PPO for Pytorch] 初次提交 2 年前
!5381 [自研][Pytorch][PPO for Pytorch] 初次提交 * [自研][Pytorch][PPO for Pytorch] 初次提交 2 年前
!7409 修改PPO加载模型相关逻辑使其正常训练,并更新性能数据 Merge pull request !7409 from 王凯宇/master 7 个月前
文档整改,gitee->gitcode Co-authored-by: Lighters_c<zyh13227@163.com> # message auto-generated for no-merge-commit merge: !7469 merge ffffix into master 文档整改,gitee->gitcode Created-by: addsubmuldiv Commit-by: Lighters_c Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!74695 个月前
!5381 [自研][Pytorch][PPO for Pytorch] 初次提交 * [自研][Pytorch][PPO for Pytorch] 初次提交 2 年前
!5381 [自研][Pytorch][PPO for Pytorch] 初次提交 * [自研][Pytorch][PPO for Pytorch] 初次提交 2 年前
!7264 [built-in][PyTorch][PPO] 修改PPO安装依赖流程 Merge pull request !7264 from 郑特驹/master 11 个月前
[自研][Pytorch][PPO for Pytorch] PPO增加license和更新readme 2 年前
Set the default parameter for allow_internal_format Co-authored-by: Ginray1<18667882700@163.com> # message auto-generated for no-merge-commit merge: !7500 merge master into master Set the default parameter for allow_internal_format Created-by: Ginray1 Commit-by: Ginray1 Merged-by: ascend-robot Description: ## Motivation Set the default parameter for allow_internal_format ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [√ ] The new code needs to comply with the Clean Code specification. - [ √] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ √] CLA has been signed and all committers have signed the CLA in this PR. - [√ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75002 个月前
README.md

PPO for Pytorch

概述

简述

近端策略优化算法(Proximal Policy Optimization, PPO)是一种新型的Policy Gradient算法。为解决Policy Gradient算法中步长难以确定的问题,PPO提出了新的目标函数可以在多个训练步骤实现小批量的更新,是目前强化学习领域适用性最广的算法之一。

  • 参考实现:

    url=https://github.com/nikhilbarhate99/PPO-PyTorch
    commit_id=6d05b5e3da80fcb9d3f4b10f6f9bc84a111d81e3
    
  • 适配昇腾 AI 处理器的实现:

    url=https://gitcode.com/ascend/ModelZoo-PyTorch.git
    code_path=PyTorch/built-in/rl/
    

准备训练环境

准备环境

  • 推荐使用最新的版本准备训练环境。

    表 1 版本配套表

    软件 版本 安装指南
    Driver AscendHDK 25.0.RC1.1 驱动固件安装指南
    Firmware AscendHDK 25.0.RC1.1
    CANN CANN 8.1.RC1 CANN 软件安装指南
    PyTorch 2.1.0 Ascend Extension for PyTorch 配置与安装
    torch_npu release v7.0.0-pytorch2.1.0
  • 三方库依赖如下表所示。

    表 2 三方库依赖表

    Torch_Version 三方库依赖版本
    PyTorch 2.1 Box2D==2.3.2 Box2D-kengz==2.3.3 gym==0.15.4
  • 安装依赖。

    在模型根目录下执行命令,安装模型对应PyTorch版本需要的依赖。

    pip install -r requirements.txt
    pip install gym[box2d]==0.15.4
    

准备数据集

无。

获取预训练模型

无。

开始训练

训练模型

本文以BipedalWalker-v2场景为例,展示训练方法,其余场景需要根据场景替换启动脚本中的超参等配置。

  1. 进入解压后的源码包根目录。

    cd /${模型文件夹名称} 
    
  2. 运行训练脚本。

    该模型支持单机单卡训练和单机8卡训练。

    • 单机单卡训练

      bash test/train_full_1p.sh  # 单卡训练
      
    • 单机单卡性能

      bash test/train_performance_1p.sh  # 单卡性能
      

    训练完成后,权重文件保存在test/output路径下,并输出模型训练精度和性能信息。

训练结果展示

表 3 训练结果展示表

NAME FPS MAX Training TimeSteps Average Reward
1p-竞品V 585.37 3000000 197.75
1p-NPU-Atlas 800T A2 284.02 3000000 240

说明:上表为历史数据,仅供参考。2025年5月10日更新的性能数据如下:

NAME 精度类型 FPS
1p-竞品 FP16 585.37
1p-Atlas 900 A2 PoDc FP16 413.79
1p-Atlas 800T A2 FP16 336.84

公网地址说明

无。

版本说明

变更

2023.08.20:首次发布。

FAQ

无。