文件最后提交记录最后更新时间
!6281 [built-in][Pytorch][Aquila2] add tokenizer file Merge pull request !6281 from htwang/master 2 年前
!6273 [built-in][Pytorch][Aquila2] add copyright and public address statement Merge pull request !6273 from htwang/master 2 年前
!6273 [built-in][Pytorch][Aquila2] add copyright and public address statement Merge pull request !6273 from htwang/master 2 年前
!6262 【built-in】[Pytorch][Aquila2] add pretrain scripts and update readme Merge pull request !6262 from htwang/master 2 年前
!6192 【built-in】[Pytorch][Aquila2]Add Aquila2 Source Code Merge pull request !6192 from zhangjunyi08/master 2 年前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
!6192 【built-in】[Pytorch][Aquila2]Add Aquila2 Source Code Merge pull request !6192 from zhangjunyi08/master 2 年前
!7376 optimize public_address_statement.md Merge pull request !7376 from 王凯宇/master 8 个月前
!6266 [built-in][Pytorch][Aquila2] update pretrain scripts Merge pull request !6266 from htwang/master 2 年前
!6192 【built-in】[Pytorch][Aquila2]Add Aquila2 Source Code Merge pull request !6192 from zhangjunyi08/master 2 年前
README.md

当前模型脚本已不随版本演进,如使用此模型可跳转至该地址

Aquila2 for Pytorch

目录

简介

模型介绍

Aquila2是智源发布的业内领先的大语言模型,在多个领域都有着广泛的应用前景,如自然语言处理、机器翻译、智能问答等。Aquila2在多个公开数据集上的表现都非常优秀,是当前自然语言处理领域的前沿技术之一。

代码实现

  • 参考实现:

    url=https://github.com/FlagOpen/FlagScale.git
    commit_id=d7dc60ec3ef6341526fd187281dc289418c17899
    
  • 适配昇腾 AI 处理器的实现:

    url=https://gitcode.com/ascend/ModelZoo-PyTorch.git
    code_path=PyTorch/built-in/foundation
    

Aquila2

准备训练环境

安装模型环境

表 1 三方库版本支持表

三方库 支持版本
PyTorch 2.1.0
transformers 4.32.0
torchvision 0.16.0

在模型根目录下执行以下命令,安装模型对应PyTorch版本需要的依赖。

pip install -r ascend/requirements.txt

安装昇腾环境

请参考昇腾社区中《Pytorch框架训练环境准备》文档搭建昇腾环境,本仓已支持表2中软件版本。

表 2 昇腾软件版本支持表

软件类型 支持版本
FrameworkPTAdapter 在研版本
CANN 在研版本
昇腾NPU固件 在研版本
昇腾NPU驱动 在研版本

准备数据集

预训练数据集准备

  1. 用户自行获取原始数据集,以wudao数据集为例,数据集目录结构如下:

    wudao_pretrain
    ├── wudao_pretrain_text_document.idx
    └── wudao_pretrain_text_document.bin
    
  2. 获取对应数据集后,在以下启动shell脚本中将data_path参数设置为本地数据集的绝对路径。

    ascend/scripts/pretrain_aquila_34B_distributed.json
    ascend/scripts/pretrain_aquila_70B_distributed.json
    

快速开始

预训练任务

本任务主要提供bf16的训练脚本,默认使用megatron分布式训练。

开始训练

  1. 进入解压后的源码包根目录。

    cd /${模型文件夹名称} 
    
  2. 根据实际运行环境修改以下shell脚本中的对应参数。

    • shell_cmds:环境变量依赖;
    • ssh port:ssh服务端口;
    • GLOO_SOCKET_IFNAME: 设置为服务器网卡名;
    • HCCL_SOCKET_IFNAME: 设置为服务器网卡名;
    ascend/scripts/pretrain_aquila_34B_distributed.json
    ascend/scripts/pretrain_aquila_70B_distributed.json
    
  3. 运行预训练脚本。

    # 34b 16卡训练
    bash ./test/pretrain_aquila2_34b.sh --extra-config=./ascend/scripts/pretrain_aquila_34B_distributed_extra_16p.json --hostfile='./hostfile'
    # 34b 16卡训练性能
    bash ./test/pretrain_aquila2_34b.sh --extra-config=./ascend/scripts/pretrain_aquila_34B_distributed_extra_16p.json --hostfile='./hostfile' --mode=performance
    
    # 70b 32卡训练
    bash ./test/pretrain_aquila2_70b.sh --extra-config=./ascend/scripts/pretrain_aquila_70B_distributed_extra_32p.json --hostfile='./hostfile'
    # 70b 32卡训练性能
    bash ./test/pretrain_aquila2_70b.sh --extra-config=./ascend/scripts/pretrain_aquila_70B_distributed_extra_32p.json --hostfile='./hostfile' --mode=performance
    
    • 多机训练场景下需要传入参数hostfile,该文件中列举了多机场景涉及的服务器IP,每行一个。
    • 模型训练日志默认保存在test/aquila_{参数规模}_{卡数}路径下。
  4. 获取模型性能。

    bash ./test/parse_throughout.sh --log=xxx
    
    • log需要传入模型训练的日志文件路径(若涉及多机场景,则传入hostfile中最后一台机器的日志文件路径)

训练结果

表 3 训练结果展示表

芯片 卡数 参数规模 seq_length micro_batch_size global_batch_size 单步迭代时间 (s/step) tokens吞吐 (tokens/s/p)
GPU 16p 34B 4096 1 32 10.8 756
Atlas A2 16p 34B 4096 2 64 - -
GPU 32p 70B 4096 1 44 - -
Atlas A2 32p 70B 4096 1 44 - -

公网地址说明

代码涉及公网地址参考 public_address_statement.md

变更说明

变更

2024.04.15:首次发布。

FAQ

暂无。