Transformer-xl
This implements training of transformer-xl on the enwik8 dataset, mainly modified from pytorch/examples.
Transformer-xl Detail
As of the current date, Ascend-Pytorch is still inefficient for contiguous operations.Therefore, Transformer-xl is re-implemented using semantics such as custom OP.
Requirements
- Install PyTorch (pytorch.org)
pip install -r requirements.txt
Data Prepration
bash getdata.sh
Training and Evaluation
To train a model, run bash test/train_full_8p.sh with the desired model architecture and the path to the enwik8 dataset:
#env
cd transformer-xl
dos2unix ./test/*.sh
# 1p train perf
bash test/train_performance_1p.sh
# 8p train perf
bash test/train_performance_8p.sh
# 8p train full
bash test/train_full_8p.sh
# 1p eval
bash test/eval_1p.sh
- 参数说明:
#--data //数据集路径,可自行修改为对应路径的数据集
#--restart_dir //加载模型checkpoint路径,可自行修改为对应路径的模型文件
#--addr //主机地址
#--max_step //最大训练步数
#--batch-size //训练批次大小
#--lr //初始学习率,默认:0.00025
#--device-list //多卡训练指定训练用卡 ,8卡:'0,1,2,3,4,5,6,7'
#--amp //是否使用混合精度
#--loss-scale //lossscale大小
#--opt-level //混合精度类型
Transformer-xl training result
| bpc | FPS | Npu_nums | Epochs | AMP_Type |
|---|---|---|---|---|
| - | 8300 | 1 | 1 | O2 |
| 1.09 | 44500 | 8 | 50 | O2 |
Statement
For details about the public address of the code in this repository, you can get from the file public_address_statement.md