文件最后提交记录最后更新时间
!5865 更新新版本LLaMA-13B(fsdp训练) Merge pull request !5865 from zhangbin/master 2 年前
!5865 更新新版本LLaMA-13B(fsdp训练) Merge pull request !5865 from zhangbin/master 2 年前
!5865 更新新版本LLaMA-13B(fsdp训练) Merge pull request !5865 from zhangbin/master 2 年前
!5865 更新新版本LLaMA-13B(fsdp训练) Merge pull request !5865 from zhangbin/master 2 年前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
!6092 流水线支持模型需要支持jit_compile环境变量 * add jit_compile support 2 年前
!5865 更新新版本LLaMA-13B(fsdp训练) Merge pull request !5865 from zhangbin/master 2 年前
!5865 更新新版本LLaMA-13B(fsdp训练) Merge pull request !5865 from zhangbin/master 2 年前
!5900 补充推理和评估部分 * 补充推理和评估部分 2 年前
!5865 更新新版本LLaMA-13B(fsdp训练) Merge pull request !5865 from zhangbin/master 2 年前
!6002 修复断点续训功能,刷新readme * 修复断点续训功能,刷新readme 2 年前
!5865 更新新版本LLaMA-13B(fsdp训练) Merge pull request !5865 from zhangbin/master 2 年前
upload GPU version code as the prototype. 2 年前
upload GPU version code as the prototype. 2 年前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
!5865 更新新版本LLaMA-13B(fsdp训练) Merge pull request !5865 from zhangbin/master 2 年前
fix link validity Co-authored-by: frozenleaves<914814442@qq.com> # message auto-generated for no-merge-commit merge: !7517 merge master into master fix link validity Created-by: frozenn Commit-by: frozenleaves Merged-by: ascend-robot Description: ## Motivation Please describe the motivation of this PR and the goal you want to achieve through this PR. ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist **Before PR**: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized **After PR**: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!75171 个月前
!5865 更新新版本LLaMA-13B(fsdp训练) Merge pull request !5865 from zhangbin/master 2 年前
README.md

当前模型脚本已不随版本演进,如使用此模型可跳转至该地址

LLaMA-7B/13B for PyTorch

概述

简述

LLaMA是由Meta AI发布的大语言系列模型,完整的名字是Large Language Model Meta AI。LLaMA按照参数量的大小具有不同的型号。LLaMA模型的效果极好,无需使用专门的数据集,只使用公开可用的数据集即可至训练至最优。本工程基于FastChat仓,主要聚焦于LLaMA-7B/13B/33B模型。

  • 参考实现:

    url=https://github.com/lm-sys/FastChat/tree/v0.2.31
    commit_id=9db21434b30a5355eb4723acc6562709f5ccc2c1
    
  • 适配昇腾 AI 处理器的实现:

    url=https://gitcode.com/ascend/ModelZoo-PyTorch.git
    code_path=PyTorch/built-in/foundation
    

准备训练环境

  • 环境准备指导

  • 请参考《Pytorch框架训练环境准备》。

    表 1 环境配置表

    Software Version
    Pytorch 2.1.0
  • 这里要替换transformers(transformers=4.33.0)库中的部分文件,使用下面命令时注意修改安装环境的路径。

     conda create -n test python==3.8
     conda activate test
    
     pip install torch==2.1.0
     pip install torch_npu-2.1.0xxxxx
    
     pip3 install --upgrade pip  # enable PEP 660 support
     pip3 install -e ".[model_worker,webui]"
    
     cp transformers_modify/modeling_llama.py /home/miniconda3/envs/test/lib/python3.8/site-packages/transformers/models/llama
     cp transformers_modify/training_args.py /home/miniconda3/envs/test/lib/python3.8/site-packages/transformers/
     cp transformers_modify/trainer.py /home/miniconda3/envs/test/lib/python3.8/site-packages/transformers/
     cp accelerate_modify/accelerator.py /home/miniconda3/envs/test/lib/python3.8/site-packages/accelerate
     cp accelerate_modify/dataclasses.py /home/miniconda3/envs/test/lib/python3.8/site-packages/accelerate/utils/
    

准备数据集

该任务以基于问答形式的数据集进行finetuning训练。 以alpaca数据集为例,数据集结构参考如下所示。注意保存数据集的路径,训练时修改脚本中的数据集路径。

[
   {
     "id": "1",
     "conversations": [
       {
         "from": "human",
         "value": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nGive three tips for staying healthy.\n\n### Response:"
       },
       {
         "from": "gpt",
         "value": "1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule."
       }
     ]
   },
   {
     "id": "2",
     "conversations": [
       {
         "from": "human",
         "value": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat are the three primary colors?\n\n### Response:"
       },
       {
         "from": "gpt",
         "value": "The three primary colors are red, blue, and yellow."
       }
     ]
   },
   ...

说明: 该数据集的训练过程脚本只作为一种参考示例。

获取预训练模型

这里可以参考原始仓库上的readme.md通过权重转换获取预训练模型,也可以从huggingface上获取预训练模型。注意保存预训练模型的位置,训练时修改脚本中的预训练模型路径。

参考预训练模型(huggingface上获取):

llama2-7b:llama2-7b-hf; llama2-13b:llama2-13b-hf

vicuna-7b:lmsys/vicuna-7b-v1.5;vicuna-13b:lmsys/vicuna-13b-v1.5。

开始训练

  1. 进入解压后的源码包根目录。

    cd /${模型文件夹名称} 
    
  2. 运行训练脚本。

    模型训练脚本参数说明如下,训练前注意修改预训练参数路径、数据集路径。

    --model_name_or_path                       // 预训练参数路径 
     --data_path                                // 数据集路径 
     --bf16                                     // 参数使用bf16保存
     --num_train_epochs                         // 训练epoch数
     --per_device_train_batch_size              // 每张卡上的训练batch size
     --per_device_eval_batch_size               // 每张卡上的评估batch size
     --gradient_accumulation_steps              // 梯度累积的步数
     --evaluation_strategy                      // 评估策略
     --save_strategy                            // ckpt保存策略
     --save_steps                               // ckpt保存间隔步数
     --save_total_limit                         // ckpt最大保存数量
     --learning_rate                            // 学习率
     --weight_decay                             // weight decay策略 
     --warmup_ratio                             // warmup步数的比例
     --lr_scheduler_type                        // 学习率衰减方式
     --logging_steps                            // 训练日志打印间隔步数
     --tf32 True                                // 使用tf32训练
     --model_max_length                         // 模型训练的sequence length
     --gradient_checkpointing                   // 是否开启重计算  
    

    模型支持单机8卡训练和双机16卡训练。

    注意事项: 模型训练时会自动加载模型保存路径中的checkpoint进行断点续训,从头开始训练要删除或转移之前保存的checkpoint。

    • 训练前注意修改环境变量路径,source环境信息

      source set_env.sh
      
    • LLaMA2-7B/Vicuna-7B训练(单机8卡)

      bash ./scripts/train_vicuna_7b.sh    
      
    • LLaMA2-13B/Vicuna-13B训练(单机8卡)

      bash ./scripts/train_vicuna_13b.sh
      
    • LLaMA2-13B/Vicuna-13B双机训练要修改scripts/train_vicuna_13b.sh脚本

      torchrun --nproc_per_node=8 --master_port=20001 --nproc_per_node=8 --nnodes=2 --node_rank=0 --master_addr=90.90.3.79 fastchat/train/train_mem.py
      

      --nnodes:节点数; --node_rank:节点顺序(如0,1); --master_addr:主节点ip。

  3. LLaMA-33b 多机启动脚本配置(使用vicuna权重)

    多机微调启动脚本:scripts/train_vicuna_33b_nnodes.sh

    其中,部分配置参数需要根据实际情况进行配置:

    • --nnodes:要使用的机器数量;
    • --nproc_per_node:每台机器使用的NPU设备数量;
    • --master_addr:需要替换为主节点机器的IP地址;
    • --node_rank:主节点需要设置为0,其他节点--node_rank按顺序设置不重复即可;
    • --master_port:主节点的服务监听端口,可根据需要自行设置。

    启动时,各个节点都要执行train_vicuna_33b_nnodes.sh脚本来拉起微调任务。

训练结果展示

FPS计算方法: per_bs * grad_acc * seq_len / time

- `per_bs`:每张卡上的训练batch size;
- `grad_acc`:梯度累积的步数;
- `seq_len`:模型训练的sequence length;
- `time`:一个global_step训练的时间。

表 2 训练结果展示表(llama2 7B/13B)

NAME FPS(tokens/s/p) Epochs
7B-NPU 3120 3
7B-竞品A 3120 3
13B-NPU(单机20层) 1730 3
13B-竞品A(单机20层) 1896 3

注:13B单机20层训练计算性能时要除以2

表 3 训练结果展示表(vicuna 7B/13B)

NAME FPS(tokens/s/p) Epochs
7B-NPU 2818 3
7B-竞品A 3070 3
13B-NPU(单机20层) 1619 3
13B-竞品A(单机20层) 1740 3

注:这里vicuna 7B/13B在NPU上使用Atlas 800T A2训练,使用竞品A训练

推理

推理环境搭建

这里要替换transformers库中的部分文件,用于推理(评估)场景,后续如果要进行训练再更改为transformers_modify中的文件。

cp tasks/modeling_llama.py /home/miniconda3/envs/test/lib/python3.8/site-packages/transformers/models/llama

推理命令

执行以下命令即可进入推理模式,注意修改模型路径。

python3 -m fastchat.serve.cli --model-path 7B-vicuna/ --device npu

推理结果展示

进入推理模式后,输入文本信息就可以进行对话推理,输入内容为空则退出推理模式。

USER: how are you
ASSISTANT: Hello! As an AI languagething, I don't have feelings or emotions like humans do. I am just a computer conne
USER: What to do if I can't sleep at night
ASSISTANT: If you can'tamarstop at night, there are a few things you can try to help yourself fall asleep:

1. Create a bedtime routine:osed to do the same thing every night before bed, such as reading a book or taking a warm bath, can help signal to your body that it's time to sleep.
2. Make your bedimages:A comfortable and inviting sleeping environment can help you fall asavigation asleep faster and stay asleep longer.
3. Avoid screens before bed:The blue lightовых devices like phones and computers can interfere with your body's production of melatonin, a hormone that helps know when it's time to sleep.
4. Try relaxation techniques:Gentle exercises like deep breathing, progressive muscle relaxation, or meditation can help calm your mind and body, making it easier to fall asleep.
5. Avoid c utations that can keep you up:Avoid caffeine, nicotine, and alcohol for several hours before bedtime, as they can interfere with your métabolisme and make it harder to fall asleep.

If these tips don't help, you can also speak to your doctor about other treatment options, such as sleep medications or therapy.
USER:
exit...

评估

评估数据集准备

这里使用的是Mmlu数据集,可以从huggingface获取。

运行评估任务

进行评估前也要进行环境准备(和推理前操作相同),然后执行以下命令。

bash tasks/vicuna_eval.sh

注意修改脚本中tasks目录所在路径,评估数据集路径,模型路径。

source /home/gpt_neox/cann1115/ascend-toolkit/set_env.sh
   
python /home/FastChat-master/tasks/task_eval.py   \
       --model_path /home/FastChat-master/7B-vicuna/  \
       --test_dir /home/FastChat-master/tasks/Mmlu/  \
       --task Mmlu

评估结果展示

subject                                     question_n  acc
0                 high_school_geography         198  0.601010
1   high_school_government_and_politics         193  0.715026
2          high_school_computer_science         100  0.550000
3                       college_biology         144  0.500000
4                      public_relations         110  0.600000
5               professional_accounting         282  0.354610
6                             sociology         201  0.681592
7                high_school_psychology         545  0.655046
8                 high_school_chemistry         203  0.334975
9                     college_chemistry         100  0.350000
10                  high_school_physics         151  0.337748
11                     college_medicine         173  0.473988
12                     security_studies         245  0.514286
13         high_school_european_history         165  0.442424
14                        jurisprudence         108  0.592593
15                      moral_scenarios         895  0.252514
16                         formal_logic         126  0.293651
17           high_school_microeconomics         238  0.462185
18                        miscellaneous         783  0.675607
19               high_school_us_history         204  0.455882
20                         econometrics         114  0.271930
21               elementary_mathematics         378  0.285714
22                    computer_security         100  0.560000
23                             virology         166  0.439759
24                          human_aging         223  0.632287
25                            astronomy         152  0.500000
26                      college_physics         102  0.235294
27                    international_law         121  0.719008
28              professional_psychology         612  0.464052
29                   conceptual_physics         235  0.408511
30                professional_medicine         272  0.459559
31               electrical_engineering         145  0.475862
32                      human_sexuality         131  0.618321
33               high_school_statistics         216  0.277778
34                     medical_genetics         100  0.600000
35                    logical_fallacies         163  0.644172
36                              anatomy         135  0.444444
37           high_school_macroeconomics         390  0.492308
38             college_computer_science         100  0.330000
39                            marketing         234  0.794872
40                           philosophy         311  0.540193
41              high_school_mathematics         270  0.274074
42                       moral_disputes         346  0.497110
43                           management         103  0.669903
44                      business_ethics         100  0.510000
45                      world_religions         171  0.707602
46                  college_mathematics         100  0.210000
47                     professional_law        1534  0.272490
48                     machine_learning         112  0.321429
49                         global_facts         100  0.370000
50                  high_school_biology         310  0.551613
51                   clinical_knowledge         265  0.524528
52            high_school_world_history         237  0.409283
53                     abstract_algebra         100  0.280000
54                    us_foreign_policy         100  0.750000
55                            nutrition         306  0.473856
56                           prehistory         324  0.580247
57                                total       14042  0.462541

版本说明

变更

2023.11.23 首次发布。

FAQ

无。