9bbefa2d创建于 3月10日历史提交

文件	最后提交记录	最后更新时间
README.md	doc: adjust doc Co-authored-by: liutongtong27<liutongtong15@h-partners.com> # message auto-generated for no-merge-commit merge: !3305 merge master_menutest into master doc: adjust doc Created-by: liutongtong27 Commit-by: liutongtong27 Merged-by: ascend-robot Description: What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed!3305	2 个月前
pretrain_llama3_8b_ptd.sh	fix for te Co-authored-by: libaokui<libaokui@huawei.com> # message auto-generated for no-merge-commit merge: merge vmaster into master fix for te Created-by: libaokui Commit-by: libaokui Merged-by: ascend-robot Description: fix for te See merge request: Ascend/MindSpeed !2862	8 个月前
pretrain_llama3_8b_ptd_adaptive_cp.sh	fix for te Co-authored-by: libaokui<libaokui@huawei.com> # message auto-generated for no-merge-commit merge: merge vmaster into master fix for te Created-by: libaokui Commit-by: libaokui Merged-by: ascend-robot Description: fix for te See merge request: Ascend/MindSpeed !2862	8 个月前

Llama3-8B

训练

Llama3-8B 训练的硬件配置:

硬件	配置
NPU	8 x Ascend NPUs

脚本

按照readme安装MindSpeed和Megatron-LM

git clone https://gitcode.com/Ascend/MindSpeed.git
pip install -e MindSpeed
git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
# git checkout 到使用的Megatron-LM分支
git checkout core_r0.8.0
mindspeed -P
mkdir model_from_hf
mkdir dataset
mkdir ckpt
mv ../MindSpeed/tools/preprocess_data.py .
mv ../MindSpeed/tools/data_handler.py .
mv ../MindSpeed/tests_extend/system_tests/llama3/pretrain_llama3_8b_ptd.sh ./examples/

搭建环境

# python3.8
conda create -n test python=3.8
conda activate test

# 安装 torch 和 torch_npu
pip install torch-2.1.0-cp38-cp38m-manylinux2014_aarch64.whl
pip install torch_npu-2.1.0*-cp38-cp38m-linux_aarch64.whl
pip install apex-0.1_ascend*-cp38-cp38m-linux_aarch64.whl

# 修改 ascend-toolkit 路径
source /usr/local/Ascend/ascend-toolkit/set_env.sh

下载 Llama3-8B 的预训练权重和词表

  #!/bin/bash
  mkdir ./model_from_hf/llama-3-8b-hf/
  cd ./model_from_hf/llama-3-8b-hf/
  wget https://hf-mirror.com/unsloth/llama-3-8b/raw/main/config.json
  wget https://hf-mirror.com/unsloth/llama-3-8b/raw/main/generation_config.json
  wget https://hf-mirror.com/unsloth/llama-3-8b/raw/main/model-00001-of-00004.safetensors
  wget https://hf-mirror.com/unsloth/llama-3-8b/raw/main/model-00002-of-00004.safetensors
  wget https://hf-mirror.com/unsloth/llama-3-8b/raw/main/model-00003-of-00004.safetensors
  wget https://hf-mirror.com/unsloth/llama-3-8b/raw/main/model-00004-of-00004.safetensors
  wget https://hf-mirror.com/unsloth/llama-3-8b/raw/main/model.safetensors.index.json
  wget https://hf-mirror.com/unsloth/llama-3-8b/raw/main/special_tokens_map.json
  wget https://hf-mirror.com/unsloth/llama-3-8b/raw/main/tokenizer.json
  wget https://hf-mirror.com/unsloth/llama-3-8b/raw/main/tokenizer_config.json
  cd ../../

预训练

4.1 准备数据集

下载 LLaMA3-8B 数据集

  # 下载数据
  cd ./dataset
  wget https://huggingface.co/datasets/tatsu-lab/alpaca/blob/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
  cd ..
  # 处理数据   
  mkdir ./dataset/llama-3-8b-hf/
  # 修改 ascend-toolkit 路径
  source /usr/local/Ascend/ascend-toolkit/set_env.sh
  python ./preprocess_data.py \
    --input ./dataset/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
    --tokenizer-name-or-path ./model_from_hf/llama-3-8b-hf/ \
    --output-prefix ./dataset/llama-3-8b-hf/alpaca \
    --workers 4 \
    --log-interval 1000 \
    --tokenizer-type PretrainedFromHF

4.2 预训练配置llama3-8B 预训练脚本: examples/pretrain_llama3_8b_ptd.sh

 # 设置 ascend-toolkit 路径
 source /usr/local/Ascend/ascend-toolkit/set_env.sh 

 # 根据实际情况配置词表、数据集、环境变量保存路径
 source "../MindSpeed/tests_extend/system_tests/env_npu.sh"
 CKPT_SAVE_DIR="./ckpt/"
 DATA_PATH="./dataset/llama-3-8b-hf/alpaca_text_document"  #数据集路径
 TOKENIZER_MODEL="./model_from_hf/llama-3-8b-hf/"  #词表路径

启动 LLaMA3-8B 预训练脚本: examples/pretrain_llama3_8b_ptd.sh

 bash examples/pretrain_llama3_8b_ptd.sh

性能

吞吐

LLaMA3-8B 在 昇腾芯片 和 参考芯片 上的性能对比：

设备	模型	迭代数	tokens吞吐 (tokens/s/p)
NPUs	LLaMA3-8B	1000	2474
参考	LLaMA3-8B	1000	2665