ascend-robot【fix】完善小模型Atlas 300I DUO硬件描述

文件	最后提交记录	最后更新时间
README.md	【fix】完善小模型Atlas 300I DUO硬件描述 Co-authored-by: Niushiya<niushiya1@huawei.com> # message auto-generated for no-merge-commit merge: !7587 merge master into master 【fix】完善小模型Atlas 300I DUO硬件描述 Created-by: niushiya Commit-by: Niushiya Merged-by: ascend-robot Description: ## Motivation 1、完善小模型Atlas 300I DUO硬件描述，补充单芯字段； ## Modification Please briefly describe what modification is made in this PR. ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7587	5 天前
infer.py	!7175 [feat][ACL_PyTorch]新增BGE-M3模型 Merge pull request !7175 from 赵江江Gitee/master	1 年前
requirements.txt	!7175 [feat][ACL_PyTorch]新增BGE-M3模型 Merge pull request !7175 from 赵江江Gitee/master	1 年前

BGE-M3模型适配(TorchAir)-推理指导

概述
推理环境准备
快速上手
- 获取源码
- 模型推理
  - 开始推理验证
  - 性能

概述

BGE-M3模型是BAAI General Embedding提出的先进的多语言、多功能文本Embedding模式。该模型基于Transformers Encoder，引入稀疏注意力和多向量检索，支持3种语义表示，同时还可以支持超过100种语言，最长可以处理8192序列长度，适合处理长文本。BGE-M3可以快速高效地生成3种不同的文本语义表示，通过语义表示间的不同组合，可以支持多种检索方式，在多语言、跨语言、长本文信息检索领域表现出色，为开发者提供了使用的工具。

推理环境准备

该模型需要以下插件与驱动
表 1 版本配套表

配套	版本	环境准备指导
固件与驱动	25.0.RC1	Pytorch框架推理环境准备
CANN	8.1.RC1	包含kernels包和toolkit包
Python	3.10	-
PyTorch	2.5.1	-
Ascend Extension PyTorch	2.5.1.post2	-
说明：Atlas 800I A2 推理卡和Atlas 300I DUO 推理卡请以CANN版本选择实际固件与驱动版本。	\	\

快速上手

获取源码

获取本仓源码

git clone https://gitcode.com/ascend/ModelZoo-PyTorch.git
cd ModelZoo-PyTorch/ACL_PyTorch/built-in/embedding/bge-m

获取开源模型源码和权重（可选）

如果您的设备可以方便的直接从hugging-hub下载权重和代码，则不需要执行这一步

# git模型下载，请确保已安装git lfs
git clone https://huggingface.co/BAAI/bge-m3
cd bge-m3
git reset --hard 5617a9f

本地下载完成后的目录树如下：

 bge-m3/
 ├── colbert_linear.pt
 ├── config.json
 ├── config_sentence_transformers.json
 ├── infer.py  # 本仓库提供的自定义推理脚本
 ├── modules.json
 ├── pytorch_model.bin
 ├── sentence_bert_config.json
 ├── sentencepiece.bpe.model
 ├── sparse_linear.pt
 ├── special_tokens_map.json
 ├── tokenizer.json
 └── tokenizer_config.json

安装依赖
```
pip3 install FlagEmbedding transformers==4.51.1 
```
其他基础依赖信息可参考requirements.txt文件。

模型推理

开始推理验证

设置环境变量，执行推理命令

# 指定使用NPU ID，默认为0
export ASCEND_RT_VISIBLE_DEVICES=0
# 如果可以方便快速从huggingface-hub下载权重，则可以使用如下命令
# python3 infer.py --model_path=BAAI/bge-m3
python3 infer.py  # 可以使用 --model_path 指定权重路径

在推理开始后，首先会默认执行warm_up，目的是执行首次编译，首次编译时间较长，在warm_up结束后，会执行推理操作，并打屏E2E性能数据。如果想测试模型推理耗时，可以在 YOUR_ENV\FlagEmbedding\inference\embedder\encoder_only\m3.py 文件423行 outputs = self.model(...) 前后添加时间打点。

其中 YOUR_ENV 是你当前的环境路径，可以通过 pip show FlagEmbedding | grep Location 查看

性能

模型	芯片	E2E	forward
bge-m3	Atlas 300I DUO(单芯)	137.59ms	23.23ms
bge-m3	Atlas 800I A2	103.88ms	14.71ms