dd6fe347创建于 4月9日历史提交

文件	最后提交记录	最后更新时间
aquila	!6281 [built-in][Pytorch][Aquila2] add tokenizer file Merge pull request !6281 from htwang/master	2 年前
ascend	!6273 [built-in][Pytorch][Aquila2] add copyright and public address statement Merge pull request !6273 from htwang/master	2 年前
megatron	!6273 [built-in][Pytorch][Aquila2] add copyright and public address statement Merge pull request !6273 from htwang/master	2 年前
test	!6262 【built-in】[Pytorch][Aquila2] add pretrain scripts and update readme Merge pull request !6262 from htwang/master	2 年前
LICENSE	!6192 【built-in】[Pytorch][Aquila2]Add Aquila2 Source Code Merge pull request !6192 from zhangjunyi08/master	2 年前
README.md	fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7517	1 个月前
README_ori.md	fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7517	1 个月前
predefined_args_megatron.json	!6192 【built-in】[Pytorch][Aquila2]Add Aquila2 Source Code Merge pull request !6192 from zhangjunyi08/master	2 年前
public_address_statement.md	!7376 optimize public_address_statement.md Merge pull request !7376 from 王凯宇/master	8 个月前
requirements.txt	!6266 [built-in][Pytorch][Aquila2] update pretrain scripts Merge pull request !6266 from htwang/master	2 年前
run.py	!6192 【built-in】[Pytorch][Aquila2]Add Aquila2 Source Code Merge pull request !6192 from zhangjunyi08/master	2 年前

当前模型脚本已不随版本演进，如使用此模型可跳转至该地址

Aquila2 for Pytorch

简介

模型介绍

Aquila2是智源发布的业内领先的大语言模型，在多个领域都有着广泛的应用前景，如自然语言处理、机器翻译、智能问答等。Aquila2在多个公开数据集上的表现都非常优秀，是当前自然语言处理领域的前沿技术之一。

代码实现

参考实现：

url=https://github.com/FlagOpen/FlagScale.git
commit_id=d7dc60ec3ef6341526fd187281dc289418c17899

适配昇腾 AI 处理器的实现：

url=https://gitcode.com/ascend/ModelZoo-PyTorch.git
code_path=PyTorch/built-in/foundation

Aquila2

准备训练环境

安装模型环境

表 1 三方库版本支持表

三方库	支持版本
PyTorch	2.1.0
transformers	4.32.0
torchvision	0.16.0

在模型根目录下执行以下命令，安装模型对应PyTorch版本需要的依赖。

pip install -r ascend/requirements.txt

安装昇腾环境

请参考昇腾社区中《Pytorch框架训练环境准备》文档搭建昇腾环境，本仓已支持表2中软件版本。

表 2 昇腾软件版本支持表

软件类型	支持版本
FrameworkPTAdapter	在研版本
CANN	在研版本
昇腾NPU固件	在研版本
昇腾NPU驱动	在研版本

准备数据集

预训练数据集准备

用户自行获取原始数据集，以wudao数据集为例，数据集目录结构如下：

wudao_pretrain
├── wudao_pretrain_text_document.idx
└── wudao_pretrain_text_document.bin

获取对应数据集后，在以下启动shell脚本中将data_path参数设置为本地数据集的绝对路径。
```
ascend/scripts/pretrain_aquila_34B_distributed.json
ascend/scripts/pretrain_aquila_70B_distributed.json
```

快速开始

预训练任务

本任务主要提供bf16的训练脚本，默认使用megatron分布式训练。

开始训练

进入解压后的源码包根目录。
```
cd /${模型文件夹名称} 
```
根据实际运行环境修改以下shell脚本中的对应参数。
- shell_cmds：环境变量依赖；
- ssh port：ssh服务端口；
- GLOO_SOCKET_IFNAME: 设置为服务器网卡名；
- HCCL_SOCKET_IFNAME: 设置为服务器网卡名；
```
ascend/scripts/pretrain_aquila_34B_distributed.json
ascend/scripts/pretrain_aquila_70B_distributed.json
```

运行预训练脚本。

# 34b 16卡训练
bash ./test/pretrain_aquila2_34b.sh --extra-config=./ascend/scripts/pretrain_aquila_34B_distributed_extra_16p.json --hostfile='./hostfile'
# 34b 16卡训练性能
bash ./test/pretrain_aquila2_34b.sh --extra-config=./ascend/scripts/pretrain_aquila_34B_distributed_extra_16p.json --hostfile='./hostfile' --mode=performance

# 70b 32卡训练
bash ./test/pretrain_aquila2_70b.sh --extra-config=./ascend/scripts/pretrain_aquila_70B_distributed_extra_32p.json --hostfile='./hostfile'
# 70b 32卡训练性能
bash ./test/pretrain_aquila2_70b.sh --extra-config=./ascend/scripts/pretrain_aquila_70B_distributed_extra_32p.json --hostfile='./hostfile' --mode=performance

多机训练场景下需要传入参数hostfile，该文件中列举了多机场景涉及的服务器IP，每行一个。
模型训练日志默认保存在test/aquila_{参数规模}_{卡数}路径下。

获取模型性能。
```
bash ./test/parse_throughout.sh --log=xxx
```
- log需要传入模型训练的日志文件路径（若涉及多机场景，则传入hostfile中最后一台机器的日志文件路径）

训练结果

表 3 训练结果展示表

芯片	卡数	参数规模	seq_length	micro_batch_size	global_batch_size	单步迭代时间 (s/step)	tokens吞吐 (tokens/s/p)
GPU	16p	34B	4096	1	32	10.8	756
Atlas A2	16p	34B	4096	2	64	-	-
GPU	32p	70B	4096	1	44	-	-
Atlas A2	32p	70B	4096	1	44	-	-

公网地址说明

代码涉及公网地址参考 public_address_statement.md

变更说明

变更

2024.04.15：首次发布。

FAQ

暂无。