文件最后提交记录最后更新时间
!4671 【fix】批量修改模型python版本,兼容环境上的python3.8版本 * fix python version 3 年前
init 4 年前
init 4 年前
init 4 年前
!5798 Network address of models to be rectified: 20 Merge pull request !5798 from Yss/network_declaration_20 2 年前
init 4 年前
init 4 年前
!88 [中山大学计算机学院][高校贡献][PyTorch训练][TinyBERT]-初次提交 * update PyTorch/contrib/nlp/tinybert/test/train_performance_8p_1.sh. * update PyTorch/contrib/nlp/tinybert/main.py. 4 年前
init 4 年前
!7376 optimize public_address_statement.md Merge pull request !7376 from 王凯宇/master 8 个月前
init 4 年前
README.md

TinyBERT

Welcome to the TinyBERT project! Please read the following instructions carefully so as to reproduce the project better.

Introduction

This implement training of TinyBERT on the SST-2 dataset is mainly modified from the following link.

For more details about the techniques of TinyBERT, refer to the paper: TinyBERT: Distilling BERT for Natural Language Understanding

Specifically, this implement is modified to adapt the NPU chips.

Requirements

  • working dir

First of all, you need to use the command cd to change the current working dir to where the test folder locates.

  • virtual environment

Run command below to install the environment(using python3)

pip3.7 install -r requirements.txt
# or
conda install --yes --file requirements.txt
  • dataset

TinyBERT is trained on the dataset SST-2, and we also apply TinyBERT to transfer learning on MNLI dataset. You can get the dataset by running the command:

wget https://ascend-pytorch-one-datasets.obs.cn-north-4.myhuaweicloud.com/train/zip/SST-2-TinyBert.zip
  • model

Three models are required in the project.

The first one is the teacher model, which is the BERT-base-uncased model finetuned on SST-2. The second one is the other teacher model, which is the BERT-base-uncased model finetuned on MNLI(only for transfer learning). And the third one is the student model. We adopt the general-distilled model(4layer-312dim) provided by Huawei Noah's Ark Lab.

You can download the pretrained-model files by running the commands:

# download the student model
wget https://ascend-pytorch-model-file.obs.cn-north-4.myhuaweicloud.com/%E9%AA%8C%E6%94%B6-%E8%AE%AD%E7%BB%83/nlp/TinyBERT/%E6%A8%A1%E5%9E%8B%E6%96%87%E4%BB%B6/%E3%80%90%E8%AE%AD%E7%BB%83%E3%80%91%E5%AD%A6%E7%94%9F%E6%A8%A1%E5%9E%8B.zip

# download the teacher model(finetuned on MNLI dataset)
wget https://ascend-pytorch-model-file.obs.cn-north-4.myhuaweicloud.com/%E9%AA%8C%E6%94%B6-%E8%AE%AD%E7%BB%83/nlp/TinyBERT/%E6%A8%A1%E5%9E%8B%E6%96%87%E4%BB%B6/%E3%80%90%E8%AE%AD%E7%BB%83%E3%80%91%EF%BC%88%E8%BF%81%E7%A7%BB%E5%AD%A6%E4%B9%A0%EF%BC%89MNLI%E6%95%99%E5%B8%88%E6%A8%A1%E5%9E%8B.zip

# download the teacher model(finetuned on SST-2 dataset)
wget https://ascend-pytorch-model-file.obs.cn-north-4.myhuaweicloud.com/%E9%AA%8C%E6%94%B6-%E8%AE%AD%E7%BB%83/nlp/TinyBERT/%E6%A8%A1%E5%9E%8B%E6%96%87%E4%BB%B6/%E3%80%90%E8%AE%AD%E7%BB%83%E3%80%91%EF%BC%88%E6%AD%A3%E5%BC%8F%E8%AE%AD%E7%BB%83%EF%BC%89SST-2%E6%95%99%E5%B8%88%E6%A8%A1%E5%9E%8B.zip

Training

To train a model, run main.py with the desired model architecture. Unlike other one-step model, there are two training processes in task-distillation of TinyBERT model.

Please pay attention: all of the performance scripts are set to stop running when having run 1000 steps, for they are just designed to test whether the code works and the files can be exported. There will be a tip like:"End performance testing. Ready to exit". It's a normal phenomenon instead of a bug. Please ignore it and just go on following the instructions and reproducing the project.

Establish empty directory

# make directory
mkdir tmp_tinybert_performance
mkdir tmp_tinybert_dir
mkdir TinyBERT_dir
mkdir TinyBERT_dir_performance
mkdir output
# set the authority(use sudo if necessary)
chmod 777 tmp_tinybert_performance
chmod 777 tmp_tinybert_dir
chmod 777 TinyBERT_dir
chmod 777 TinyBERT_dir_performance
chmod 777 output

1p mode

# Step 1: run the intermediate layer distillation.
bash ./test/train_performance_1p_1.sh
bash ./test/train_full_1p_1.sh

# Step 2: run the prediction layer distillation. 
bash ./test/train_full_1p_2.sh

# Step 3: run the evaluation on the SST-2 dataset
bash ./test/train_eval_1p.sh      

8p mode

# Step 1: run the intermediate layer distillation.
bash ./test/train_performance_8p_1.sh
bash ./test/train_full_8p_1.sh

# Step 2: run the prediction layer distillation. 
bash ./test/train_full_8p_2.sh

# Step 3: run the evaluation on the SST-2 dataset
bash ./test/train_eval_8p.sh    

Other setting

# Transfer learning
bash ./test/train_finetune_1p.sh
# demo(automatically repeat 20 times)
bash ./test/demo.sh

After finishing the whole training process, you can see all output files in the directory ./output

Result

device acc of 1p acc of 8p fps of 1p fps of 8p
GPU 91.63 90.94 337.50 2308.89
NPU 90.85 90.04 94.54 554.09
baseline(TinyBERT4) 92.6 None None None
requirement 87.6 None None None

Statement

For details about the public address of the code in this repository, you can get from the file public_address_statement.md