GLiNER双编码器模型,采用ModernBERT-large和BGE-base-en,支持任意实体类型识别,推理更快,泛化能力强,上下文长度达8192 tokens,效率提升4倍。【此简介由AI生成】
license: apache-2.0 language:
- en library_name: gliner datasets:
- urchade/pile-mistral-v0.1
- numind/NuNER
- knowledgator/GLINER-multi-task-synthetic-data pipeline_tag: token-classification tags:
- NER
- GLiNER
- information extraction
- encoder
- entity recognition
- modernbert base_model:
- answerdotai/ModernBERT-large
- BAAI/bge-base-en-v1.5
关于
GLiNER 是一款基于双向 Transformer 编码器(类 BERT 架构)的命名实体识别(NER)模型,能够识别任意类型的实体。它既突破了传统 NER 模型仅能识别预设实体类型的限制,也为资源受限场景提供了优于大型语言模型(LLM)的解决方案——虽然 LLM 具备灵活性,但其计算成本高昂且模型体积庞大。
本版本采用双编码器架构:文本编码器采用 ModernBERT-large,实体标签编码器采用句子 Transformer 模型 BGE-base-en。
相比单编码器版 GLiNER,此架构具有以下优势:
- 可同时识别的实体数量无上限;
- 若实体嵌入已预处理,推理速度更快;
- 对未见过实体的泛化能力更强;
采用 ModernBERT 编码器后,模型在保持性能相当的同时,推理效率较基于 DeBERTa 的模型提升高达 4 倍,且支持最大 8,192 个标记的上下文长度。

但双编码器架构也存在局限性,例如缺乏标签间的交互机制,可能导致模型难以区分语义相近但上下文语境不同的实体。
安装与使用
安装或更新 gliner 包:
pip install gliner -U
您需要安装最新版本的transformers才能使用此模型:
pip install git+https://github.com/huggingface/transformers.git
下载 GLiNER 库后,您可以导入 GLiNER 类。随后可通过 GLiNER.from_pretrained 加载此模型,并使用 predict_entities 进行实体预测。
from gliner import GLiNER
model = GLiNER.from_pretrained("knowledgator/modern-gliner-bi-large-v1.0")
text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards,[note 3] a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, goals in the European Championship (14), international goals (128) and international appearances (205). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 850 official senior career goals for club and country, making him the top goalscorer of all time.
"""
labels = ["person", "award", "date", "competitions", "teams"]
entities = model.predict_entities(text, labels, threshold=0.3)
for entity in entities:
print(entity["text"], "=>", entity["label"])
Cristiano Ronaldo dos Santos Aveiro => person
5 February 1985 => date
Al Nassr => teams
Portugal national team => teams
Ballon d'Or => award
UEFA Men's Player of the Year Awards => award
European Golden Shoes => award
UEFA Champions Leagues => competitions
UEFA European Championship => competitions
UEFA Nations League => competitions
Champions League => competitions
European Championship => competitions
若需启用Flash Attention或扩展序列长度,请参考以下代码实现:
首先安装Flash Attention与Triton扩展包:
pip install flash-attn triton
model = GLiNER.from_pretrained("knowledgator/modern-gliner-bi-large-v1.0",
_attn_implementation = 'flash_attention_2',
max_len = 2048).to('cuda:0')
如果您有大量实体并希望对其进行预嵌入,请参考以下代码片段:
labels = ["your entities"]
texts = ["your texts"]
entity_embeddings = model.encode_labels(labels, batch_size = 8)
outputs = model.batch_predict_with_embeds(texts, entity_embeddings, labels)
性能基准测试

下方表格展示了在多种命名实体识别数据集上的基准测试结果:
| 数据集 | 得分 |
|---|---|
| ACE 2004 | 30.5% |
| ACE 2005 | 26.7% |
| AnatEM | 37.2% |
| Broad Tweet Corpus | 72.1% |
| CoNLL 2003 | 69.3% |
| FabNER | 22.0% |
| FindVehicle | 40.3% |
| GENIA_NER | 55.6% |
| HarveyNER | 16.1% |
| MultiNERD | 73.8% |
| Ontonotes | 39.2% |
| PolyglotNER | 49.1% |
| TweetNER7 | 39.6% |
| WikiANN en | 54.7% |
| WikiNeural | 83.7% |
| bc2gm | 53.7% |
| bc4chemd | 52.1% |
| bc5cdr | 67.0% |
| ncbi | 61.7% |
| 平均得分 | 49.7% |
| CrossNER_AI | 58.1% |
| CrossNER_literature | 60.0% |
| CrossNER_music | 73.0% |
| CrossNER_politics | 72.8% |
| CrossNER_science | 66.5% |
| mit-movie | 47.6% |
| mit-restaurant | 40.6% |
| 零样本基准平均得分 | 59.8% |
加入Discord社区
欢迎加入我们的Discord社区,获取最新动态、技术支持并参与模型相关讨论。点击加入Discord
引用说明
若在您的研究工作中使用本模型,请引用:
@misc{modernbert,
title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference},
author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
year={2024},
eprint={2412.13663},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.13663},
}
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{stepanov2024gliner,
title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks},
author={Ihor Stepanov and Mykhailo Shtopko},
year={2024},
eprint={2406.12925},
archivePrefix={arXiv},
primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}