ascend-robotmodify documents and fix bug for minirl

文件	最后提交记录	最后更新时间
callback	!4936 讯飞BERT-NER-CRF模型初次提交 * initial commit of BERT-NER-Pytorch for iflytek	3 年前
datasets	!4936 讯飞BERT-NER-CRF模型初次提交 * initial commit of BERT-NER-Pytorch for iflytek	3 年前
losses	!4936 讯飞BERT-NER-CRF模型初次提交 * initial commit of BERT-NER-Pytorch for iflytek	3 年前
metrics	!4936 讯飞BERT-NER-CRF模型初次提交 * initial commit of BERT-NER-Pytorch for iflytek	3 年前
models	!5530 [众智][BERT-NER][PyTorch]部分参数解析新增 Merge pull request !5530 from 刘国庆/bert-ner	2 年前
outputs	!4936 讯飞BERT-NER-CRF模型初次提交 * initial commit of BERT-NER-Pytorch for iflytek	3 年前
processors	!4936 讯飞BERT-NER-CRF模型初次提交 * initial commit of BERT-NER-Pytorch for iflytek	3 年前
test	!7413 BERT-NER-Pytorch、Wenet_Conformer_for_Pytorch性能优化 Merge pull request !7413 from 王凯宇/master	9 个月前
tools	!5530 [众智][BERT-NER][PyTorch]部分参数解析新增 Merge pull request !5530 from 刘国庆/bert-ner	2 年前
.gitignore	!4936 讯飞BERT-NER-CRF模型初次提交 * initial commit of BERT-NER-Pytorch for iflytek	3 年前
LICENSE	!4936 讯飞BERT-NER-CRF模型初次提交 * initial commit of BERT-NER-Pytorch for iflytek	3 年前
README.md	modify documents and fix bug for minirl Co-authored-by: xvxuopop<xvxuopop@gmail.com> # message auto-generated for no-merge-commit merge: !7633 merge master into master modify documents and fix bug for minirl Created-by: jin-yongxu Commit-by: xvxuopop Merged-by: ascend-robot Description: ## Motivation modify documents and fix bug for minirl ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7633	23 天前
__init__.py	!4936 讯飞BERT-NER-CRF模型初次提交 * initial commit of BERT-NER-Pytorch for iflytek	3 年前
public_address_statement.md	!7376 optimize public_address_statement.md Merge pull request !7376 from 王凯宇/master	10 个月前
requirements.txt	!6502 add bert ner 16p script Merge pull request !6502 from 卓博航/master	2 年前
run_ner_crf.py	Set the default parameter for allow_internal_format Co-authored-by: Ginray1<18667882700@163.com> # message auto-generated for no-merge-commit merge: !7500 merge master into master Set the default parameter for allow_internal_format Created-by: Ginray1 Commit-by: Ginray1 Merged-by: ascend-robot Description: ## Motivation Set the default parameter for allow_internal_format ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [√ ] The new code needs to comply with the Clean Code specification. - [ √] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ √] CLA has been signed and all committers have signed the CLA in this PR. - [√ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7500	4 个月前
run_ner_softmax.py	!4936 讯飞BERT-NER-CRF模型初次提交 * initial commit of BERT-NER-Pytorch for iflytek	3 年前
run_ner_span.py	!4936 讯飞BERT-NER-CRF模型初次提交 * initial commit of BERT-NER-Pytorch for iflytek	3 年前

BERT-NER-CRF for PyTorch

概述
准备训练环境
准备数据集
开始训练
训练结果展示
版本说明

概述

简述

BERT-CRF 是用于自然语言处理中实体识别任务的模型

参考实现 https://github.com/lonePatient/BERT-NER-Pytorch
本代码仓为适配NPU的实现

准备训练环境

准备环境

推荐优先使用最新版本配套准备训练环境。如需复现本文历史结果或使用旧环境，可参考下方历史版本配套表。

表 1 版本配套表

软件	版本	安装指南
Driver	AscendHDK 25.0.RC1.1	《驱动固件安装指南》
Firmware	AscendHDK 25.0.RC1.1	《驱动固件安装指南》
CANN	CANN 8.1.RC1	《CANN 软件安装指南》
PyTorch	2.1.0	《Ascend Extension for PyTorch 配置与安装》
torch_npu	release v7.0.0-pytorch2.1.0	《Ascend Extension for PyTorch 配置与安装》

三方库依赖如下表所示

表 2 三方库依赖表

Torch_Version 三方库依赖版本

PyTorch 2.1 transformers 4.29.2
安装依赖

在模型根目录下执行命令，安装模型需要的依赖。
```
pip install -r requirements.txt
```

Torch_Version	三方库依赖版本
PyTorch 2.1	transformers 4.29.2

准备数据集

本模型在Cluener数据集上完成训练和验证。在 https://storage.googleapis.com/cluebenchmark/tasks/cluener_public.zip 下载Cluener数据集，解压后放到 datasets 目录下，形成如下的目录结构：

BERT-NER-Pytorch
└── datasets
    ├── cner
    └── cluener 
        ├── cluener_predict.json
        ├── dev.json
        ├── __init__.py 
        ├── README.md
        ├── test.json
        └── train.json

准备预训练权重

在模型目录下创建 prev_trained_model 目录，并从 https://huggingface.co/bert-base-chinese/tree/main/ 下载预训练权重和config文件等相关信息，将其放在 prev_trained_model 目录下，形成如下的目录结构：

BERT-NER-Pytorch
└── prev_trained_model
    └── bert-base-chinese
        ├── config.json
        ├── pytorch_model.bin 
        ├── tokenizer_config.json
        ├── tokenizer.json
        └── vocab.txt

开始训练

运行训练脚本

单机8卡训练

bash test/train_full_8p.sh      # 8卡精度训练
bash test/train_performance_8p.sh    # 8卡性能训练

单机16卡训练

bash test/train_full_16p.sh     # 16卡精度训练
bash test/train_performance_16p.sh    # 16卡性能训练

训练完成后，权重文件保存在当前路径下，并输出模型训练精度和性能信息。

训练结果展示

表 3

Name	F1	Wps	Samples/Second	Epochs
8p-NPU	79.16	1163.21	1129.4	4

说明：上表为历史数据，仅供参考。2025年5月10日更新的性能数据如下：

NAME	精度类型	FPS
8p-竞品	FP32	1942.15
8p-Atlas 900 A2 PoDc	FP32	1407.21

版本说明

变更

2023.6.19 首次发布

FAQ

若遇到safetensors三方库报这个错误“safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge”，原因是accelerate版本 >= v0.25.0，会默认使用safetensors，导致报错。参考解决方法，安装0.24.1版本的accelerate。
```
pip install accelerate==0.24.1
```