ascend-robotSAM3 ONNX optimization and FlashAttentionTik patch

文件	最后提交记录	最后更新时间
README.md	SAM3 ONNX optimization and FlashAttentionTik patch Co-authored-by: zouyizhou<zouyizhou@huawei.com> # message auto-generated for no-merge-commit merge: !7604 merge sam3-pr7583-opt-flash into master SAM3 ONNX optimization and FlashAttentionTik patch Created-by: zouyizhou Commit-by: zouyizhou Merged-by: ascend-robot Description: ## Motivation 拆分自 PR 7583，依赖 PR 7583 先合入。本 PR 补充 SAM3 ONNX 优化流程和 Atlas 300V PRO FlashAttentionTik 改图 ## Modification 新增 sam3_optimize_onnx.py，执行 onnxslim、shape inference 和动态输出 shape 固化。新增 sam3_flashattentiontik_onnx.py，替换无 mask attention 为 FlashAttentionTik。恢复 sam3_convert_om.sh 中的优化/FlashAttentionTik 转换流程，并补充 README/requirements。 ## Self-test (Optional) ![image.png](https://raw.gitcode.com/user-images/assets/8112803/75691395-69b4-4e94-8300-197e8a4ad2be/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/1f5ae29c-6640-4ca8-8292-e9e82929ce84/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/d9d820bd-7a88-49f1-aa0e-ef2d4a66f49d/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7604	1 个月前
batch_eval.py	新增sam3适配代码 Co-authored-by: yangxinglong<13651641606@163.com> # message auto-generated for no-merge-commit merge: !7552 merge master into master 新增sam3适配代码 Created-by: yangxinglong Commit-by: yangxinglong Merged-by: ascend-robot Description: ## Motivation 新增sam3适配代码 ## Modification 新增sam3适配代码 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7552	2 个月前
fusion_switch.cfg	SAM3 performance optimization Co-authored-by: zouyizhou<zouyizhou@huawei.com> # message auto-generated for no-merge-commit merge: !7583 merge master into master SAM3 performance optimization Created-by: zouyizhou Commit-by: zouyizhou Merged-by: ascend-robot Description: ## Motivation The performance is optimized by replacing small operators with the flashattentiontik fusion operator, fusing the large qkv matrix, and replacing conv with matmul. ## Modification The performance is optimized by replacing small operators with the flashattentiontik fusion operator, fusing the large qkv matrix, and replacing conv with matmul. ## Self-test (Optional) ![image.png](https://raw.gitcode.com/user-images/assets/8112803/75691395-69b4-4e94-8300-197e8a4ad2be/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/1f5ae29c-6640-4ca8-8292-e9e82929ce84/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/d9d820bd-7a88-49f1-aa0e-ef2d4a66f49d/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7583	1 个月前
requirements.txt	SAM3 ONNX optimization and FlashAttentionTik patch Co-authored-by: zouyizhou<zouyizhou@huawei.com> # message auto-generated for no-merge-commit merge: !7604 merge sam3-pr7583-opt-flash into master SAM3 ONNX optimization and FlashAttentionTik patch Created-by: zouyizhou Commit-by: zouyizhou Merged-by: ascend-robot Description: ## Motivation 拆分自 PR 7583，依赖 PR 7583 先合入。本 PR 补充 SAM3 ONNX 优化流程和 Atlas 300V PRO FlashAttentionTik 改图 ## Modification 新增 sam3_optimize_onnx.py，执行 onnxslim、shape inference 和动态输出 shape 固化。新增 sam3_flashattentiontik_onnx.py，替换无 mask attention 为 FlashAttentionTik。恢复 sam3_convert_om.sh 中的优化/FlashAttentionTik 转换流程，并补充 README/requirements。 ## Self-test (Optional) ![image.png](https://raw.gitcode.com/user-images/assets/8112803/75691395-69b4-4e94-8300-197e8a4ad2be/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/1f5ae29c-6640-4ca8-8292-e9e82929ce84/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/d9d820bd-7a88-49f1-aa0e-ef2d4a66f49d/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7604	1 个月前
sam3_coco_iou_eval.py	SAM3 ONNX optimization and FlashAttentionTik patch Co-authored-by: zouyizhou<zouyizhou@huawei.com> # message auto-generated for no-merge-commit merge: !7604 merge sam3-pr7583-opt-flash into master SAM3 ONNX optimization and FlashAttentionTik patch Created-by: zouyizhou Commit-by: zouyizhou Merged-by: ascend-robot Description: ## Motivation 拆分自 PR 7583，依赖 PR 7583 先合入。本 PR 补充 SAM3 ONNX 优化流程和 Atlas 300V PRO FlashAttentionTik 改图 ## Modification 新增 sam3_optimize_onnx.py，执行 onnxslim、shape inference 和动态输出 shape 固化。新增 sam3_flashattentiontik_onnx.py，替换无 mask attention 为 FlashAttentionTik。恢复 sam3_convert_om.sh 中的优化/FlashAttentionTik 转换流程，并补充 README/requirements。 ## Self-test (Optional) ![image.png](https://raw.gitcode.com/user-images/assets/8112803/75691395-69b4-4e94-8300-197e8a4ad2be/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/1f5ae29c-6640-4ca8-8292-e9e82929ce84/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/d9d820bd-7a88-49f1-aa0e-ef2d4a66f49d/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7604	1 个月前
sam3_convert_om.sh	SAM3 ONNX optimization and FlashAttentionTik patch Co-authored-by: zouyizhou<zouyizhou@huawei.com> # message auto-generated for no-merge-commit merge: !7604 merge sam3-pr7583-opt-flash into master SAM3 ONNX optimization and FlashAttentionTik patch Created-by: zouyizhou Commit-by: zouyizhou Merged-by: ascend-robot Description: ## Motivation 拆分自 PR 7583，依赖 PR 7583 先合入。本 PR 补充 SAM3 ONNX 优化流程和 Atlas 300V PRO FlashAttentionTik 改图 ## Modification 新增 sam3_optimize_onnx.py，执行 onnxslim、shape inference 和动态输出 shape 固化。新增 sam3_flashattentiontik_onnx.py，替换无 mask attention 为 FlashAttentionTik。恢复 sam3_convert_om.sh 中的优化/FlashAttentionTik 转换流程，并补充 README/requirements。 ## Self-test (Optional) ![image.png](https://raw.gitcode.com/user-images/assets/8112803/75691395-69b4-4e94-8300-197e8a4ad2be/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/1f5ae29c-6640-4ca8-8292-e9e82929ce84/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/d9d820bd-7a88-49f1-aa0e-ef2d4a66f49d/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7604	1 个月前
sam3_export_onnx.py	SAM3 performance optimization Co-authored-by: zouyizhou<zouyizhou@huawei.com> # message auto-generated for no-merge-commit merge: !7583 merge master into master SAM3 performance optimization Created-by: zouyizhou Commit-by: zouyizhou Merged-by: ascend-robot Description: ## Motivation The performance is optimized by replacing small operators with the flashattentiontik fusion operator, fusing the large qkv matrix, and replacing conv with matmul. ## Modification The performance is optimized by replacing small operators with the flashattentiontik fusion operator, fusing the large qkv matrix, and replacing conv with matmul. ## Self-test (Optional) ![image.png](https://raw.gitcode.com/user-images/assets/8112803/75691395-69b4-4e94-8300-197e8a4ad2be/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/1f5ae29c-6640-4ca8-8292-e9e82929ce84/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/d9d820bd-7a88-49f1-aa0e-ef2d4a66f49d/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7583	1 个月前
sam3_flashattentiontik_onnx.py	SAM3 ONNX optimization and FlashAttentionTik patch Co-authored-by: zouyizhou<zouyizhou@huawei.com> # message auto-generated for no-merge-commit merge: !7604 merge sam3-pr7583-opt-flash into master SAM3 ONNX optimization and FlashAttentionTik patch Created-by: zouyizhou Commit-by: zouyizhou Merged-by: ascend-robot Description: ## Motivation 拆分自 PR 7583，依赖 PR 7583 先合入。本 PR 补充 SAM3 ONNX 优化流程和 Atlas 300V PRO FlashAttentionTik 改图 ## Modification 新增 sam3_optimize_onnx.py，执行 onnxslim、shape inference 和动态输出 shape 固化。新增 sam3_flashattentiontik_onnx.py，替换无 mask attention 为 FlashAttentionTik。恢复 sam3_convert_om.sh 中的优化/FlashAttentionTik 转换流程，并补充 README/requirements。 ## Self-test (Optional) ![image.png](https://raw.gitcode.com/user-images/assets/8112803/75691395-69b4-4e94-8300-197e8a4ad2be/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/1f5ae29c-6640-4ca8-8292-e9e82929ce84/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/d9d820bd-7a88-49f1-aa0e-ef2d4a66f49d/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7604	1 个月前
sam3_image_infer.py	SAM3 performance optimization Co-authored-by: zouyizhou<zouyizhou@huawei.com> # message auto-generated for no-merge-commit merge: !7583 merge master into master SAM3 performance optimization Created-by: zouyizhou Commit-by: zouyizhou Merged-by: ascend-robot Description: ## Motivation The performance is optimized by replacing small operators with the flashattentiontik fusion operator, fusing the large qkv matrix, and replacing conv with matmul. ## Modification The performance is optimized by replacing small operators with the flashattentiontik fusion operator, fusing the large qkv matrix, and replacing conv with matmul. ## Self-test (Optional) ![image.png](https://raw.gitcode.com/user-images/assets/8112803/75691395-69b4-4e94-8300-197e8a4ad2be/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/1f5ae29c-6640-4ca8-8292-e9e82929ce84/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/d9d820bd-7a88-49f1-aa0e-ef2d4a66f49d/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7583	1 个月前
sam3_optimize_onnx.py	SAM3 ONNX optimization and FlashAttentionTik patch Co-authored-by: zouyizhou<zouyizhou@huawei.com> # message auto-generated for no-merge-commit merge: !7604 merge sam3-pr7583-opt-flash into master SAM3 ONNX optimization and FlashAttentionTik patch Created-by: zouyizhou Commit-by: zouyizhou Merged-by: ascend-robot Description: ## Motivation 拆分自 PR 7583，依赖 PR 7583 先合入。本 PR 补充 SAM3 ONNX 优化流程和 Atlas 300V PRO FlashAttentionTik 改图 ## Modification 新增 sam3_optimize_onnx.py，执行 onnxslim、shape inference 和动态输出 shape 固化。新增 sam3_flashattentiontik_onnx.py，替换无 mask attention 为 FlashAttentionTik。恢复 sam3_convert_om.sh 中的优化/FlashAttentionTik 转换流程，并补充 README/requirements。 ## Self-test (Optional) ![image.png](https://raw.gitcode.com/user-images/assets/8112803/75691395-69b4-4e94-8300-197e8a4ad2be/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/1f5ae29c-6640-4ca8-8292-e9e82929ce84/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8112803/d9d820bd-7a88-49f1-aa0e-ef2d4a66f49d/image.png 'image.png') ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [x] The new code needs to comply with the Clean Code specification. - [x] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [x] CLA has been signed and all committers have signed the CLA in this PR. - [x] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7604	1 个月前
standalone_cgf1.py	新增sam3适配代码 Co-authored-by: yangxinglong<13651641606@163.com> # message auto-generated for no-merge-commit merge: !7552 merge master into master 新增sam3适配代码 Created-by: yangxinglong Commit-by: yangxinglong Merged-by: ascend-robot Description: ## Motivation 新增sam3适配代码 ## Modification 新增sam3适配代码 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7552	2 个月前
truck.jpg	新增sam3适配代码 Co-authored-by: yangxinglong<13651641606@163.com> # message auto-generated for no-merge-commit merge: !7552 merge master into master 新增sam3适配代码 Created-by: yangxinglong Commit-by: yangxinglong Merged-by: ascend-robot Description: ## Motivation 新增sam3适配代码 ## Modification 新增sam3适配代码 ## Self-test (Optional) If modifications to this PR may cause/fix function/accuracy/performance DTSs/issues, a self-inspection record needs to be attached. ## BC-breaking (Optional) If there are compatibility issues, such as dependencies on cann/torch_npu versions, they need to be explained in the PR. ## Checklist Before PR: - [ ] The new code needs to comply with the Clean Code specification. - [ ] The PR content is self-checked, and the expression can be clear and the writing standardized After PR: - [ ] CLA has been signed and all committers have signed the CLA in this PR. - [ ] The ci-pipeline is passed, Code Check is passed. See merge request: Ascend/ModelZoo-PyTorch!7552	2 个月前

SAM3 模型离线推理指导

概述
推理环境准备
快速上手

概述

SAM3 是 Meta 开源的 Segment Anything 系列模型，面向图像和视频的可提示分割任务，支持使用文本、点、框、mask 等 prompt 对目标进行检测、分割和跟踪。本文档覆盖 SAM3 图像文本 prompt 分割分支在昇腾 NPU 上的 ONNX 导出、OM 转换、性能测试和精度测试流程。

url=https://github.com/huggingface/transformers/tree/v5.0.0/src/transformers/models/sam3
commit_id=08810b1e278938278c50153ee1edfd7a20a759da
model_name=sam3

推理环境准备

配套	版本	环境准备指导
固件与驱动	25.3.RC1	-
CANN	8.3.RC1	-
Python	3.11.6	-
PyTorch	2.9.0	-
Ascend Extension PyTorch	2.9.0	-

本文已验证 Atlas 800I A2 和 Atlas 300V PRO。推荐拉取华为昇腾社区镜像部署 ascend hub

快速上手

环境配置

拉取源码。

git clone https://gitcode.com/Ascend/ModelZoo-PyTorch.git
cd ModelZoo-PyTorch/ACL_PyTorch/built-in/cv/SAM3

设置部署参数。

export TARGET_DEVICE=300I
export BATCH_SIZE=4
export SOC_VERSION=Ascend310P3

TARGET_DEVICE=300I 对应 Atlas 300V PRO；TARGET_DEVICE=800I 对应Atlas 800I A2 。
BATCH_SIZE 控制 ATC 固化 batch size，并参与 OM 命名。
SOC_VERSION 为必填项，请根据 npu-smi info 回显设置，例如 Atlas 300V PRO 通常为 Ascend310P3，Atlas 800I A2 通常为 Ascend910B4。

安装 Python 依赖。

python3 -m pip install -r requirements.txt
pip install --trusted-host ascend.devcloud.huaweicloud.com -i https://ascend.devcloud.huaweicloud.com/pypi/simple/ mindiesd==3.0.0

requirements.txt 已包含 modelscope。Atlas 300V PRO 转换时使用 mindiesd 提供的 FlashAttentionTik 自定义算子。

安装 ais_bench 推理工具。

python3 -m pip install msit
msit install surgeon
wget https://aisbench.obs.myhuaweicloud.com/packet/ais_bench_infer/0.0.2/ait/ais_bench-0.0.2-py3-none-any.whl
wget https://aisbench.obs.myhuaweicloud.com/packet/ais_bench_infer/0.0.2/ait/aclruntime-0.0.2-cp311-cp311-linux_aarch64.whl
python3 -m pip install ais_bench-0.0.2-py3-none-any.whl
python3 -m pip install aclruntime-0.0.2-cp311-cp311-linux_aarch64.whl

获取权重

从 ModelScope 下载 SAM3 权重至 sam3_model 目录。

mkdir -p sam3_model
modelscope download --model facebook/sam3 --local_dir ./sam3_model

权重目录需包含 sam3.pt、model.safetensors、config.json、tokenizer 和 processor 配置文件。

导出 ONNX

ONNX 默认导出动态 batch，最终 OM 的 batch size 在 ATC 阶段固化。导出脚本会对 ViT patch embedding、QKV projection 和 RoPE 表达进行等价改写，提升后续图优化和 ATC 转换效果；TARGET_DEVICE=300I 时额外将模型 attention implementation 设置为 eager，其它取值使用模型默认 attention。

执行 ONNX 导出。

python3 sam3_export_onnx.py \
    --target_device "${TARGET_DEVICE}" \
    --model_path ./sam3_model \
    --output_dir "./sam3_onnx_${TARGET_DEVICE}"

参数说明：

--target_device：目标设备类型；300I 使用 Atlas 300V PRO 导出配置，其它取值使用 Atlas 800I A2 导出配置。
--model_path：SAM3 权重和 processor 配置所在目录。
--output_dir：ONNX 输出目录。

导出结果为 ./sam3_onnx_${TARGET_DEVICE}/sam3.onnx。

查看模型输入输出语义。

原始导出的 ONNX 未完成 shape 固化，输出形状在 ONNX 文件中可能以符号维呈现；后续 sam3_convert_om.sh 会通过探针 shape、onnxslim 和 shape inference 生成转换用 ONNX。优化和 ATC 转换使用的输入输出语义如下，表中的 N 在 ATC 阶段由 BATCH_SIZE 固化：

类型	名称	数据类型	形状	说明
输入	`pixel_values`	FLOAT32	N x 3 x 1008 x 1008	图像输入，NCHW
输入	`input_ids`	INT64	N x 32	文本 prompt token
输入	`attention_mask`	INT64	N x 32	文本 mask
输出	`pred_masks`	FLOAT32	N x 200 x 288 x 288	目标 mask logits
输出	`pred_logits`	FLOAT32	N x 200	目标分数
输出	`pred_boxes`	FLOAT32	N x 200 x 4	目标框
输出	`presence_logits`	FLOAT32	N x 1	目标存在性分数
输出	`semantic_seg`	FLOAT32	N x 1 x 288 x 288	语义分割 mask

1008 来自 SAM3 processor 图像预处理尺寸；32 为导出和推理脚本固定文本长度；200 和 288 分别为 SAM3 查询数和 mask 输出分辨率。推理脚本按转换后的 OM 输出形状解析结果。

转换 OM

sam3_convert_om.sh 会先执行 ONNX 优化，再调用 ATC 生成 OM。ONNX 优化包含 onnxslim 简化和 shape inference；TARGET_DEVICE=300I 时启用 FlashAttentionTik 改图；其它取值使用 fusion_switch.cfg。

配置 CANN 环境变量。

source /usr/local/Ascend/ascend-toolkit/set_env.sh

上述 CANN 安装路径仅供参考，请按实际安装路径配置环境变量。

执行 ONNX 优化和 OM 转换。

bash sam3_convert_om.sh

确认转换产物。

优化后 ONNX 固定保存为 ./sam3_onnx_optimized_${TARGET_DEVICE}_bs${BATCH_SIZE}/optimized_onnx.onnx。转换耗时较长，请等待脚本执行完成。输出文件示例：

sam3_om/sam3_300I_bs4*.om

ATC 可能在输出文件名中追加系统和架构后缀，请以实际生成的 .om 文件名为准。

性能测试

执行纯模型性能测试。

python3 -m ais_bench \
    --model ./sam3_om/sam3_${TARGET_DEVICE}_bs${BATCH_SIZE}*.om \
    --device 0 \
    --loop 10

精度测试

Atlas 300V PRO：COCO val2017

执行前请使用 TARGET_DEVICE=300I 并设置实际 SOC_VERSION 完成 ONNX 导出和 OM 转换。COCO val2017 精度验证以图像标注中的类别名作为文本 prompt，将单张图像中的一个类别作为一个评估样本；同类多个实例合并后与预测 mask 计算 IoU，最终统计 Mean IoU。

准备数据集。

mkdir -p ./coco2017
cd ./coco2017
wget -c http://images.cocodataset.org/zips/val2017.zip
wget -c http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip -o val2017.zip
unzip -o annotations_trainval2017.zip
cd ..

解压后目录结构如下：

coco2017/
├── annotations
│   └── instances_val2017.json
└── val2017

执行精度验证。

python3 sam3_coco_iou_eval.py \
    --model_path ./sam3_om/sam3_${TARGET_DEVICE}_bs${BATCH_SIZE}*.om \
    --processor_path ./sam3_model \
    --coco_root ./coco2017 \
    --device_ids "0" \
    --batch_size "${BATCH_SIZE}"

如单卡验证时机器上还有空闲 NPU，可将 --device_ids 设置为多个设备，例如 --device_ids "0,1,2,3"，按样本切分并行验证以缩短 COCO 精度测试时间。

参数说明：

--model_path：待验证的 OM 文件路径。
--processor_path：SAM3 processor 和 tokenizer 配置目录。
--coco_root：COCO val2017 数据集根目录，需包含 val2017 和 annotations/instances_val2017.json。
--device_ids：推理使用的 NPU 设备 ID，多个设备用逗号分隔。
--batch_size：OM 转换时使用的 batch size。

Atlas 800I A2：sa-co-gold

执行前请使用 TARGET_DEVICE=800I 并设置实际 SOC_VERSION 完成 ONNX 导出和 OM 转换。Atlas 800I A2 精度验证使用 sa-co-gold 数据集。

准备数据集。

下载地址：

SA-1B 图像：https://sa-co.roboflow.com/gold/sa1b-images.zip
MetaCLIP 图像：https://sa-co.roboflow.com/gold/metaclip-images.zip
真实标注：https://sa-co.roboflow.com/gold/gt-annotations.zip

下载并解压后，目录结构如下：

dataset/
├── gt-annotations
├── metaclip-images
└── sa1b-images

执行 OM 推理。

python3 sam3_image_infer.py \
    --model_path ./sam3_om/sam3_800I_bs${BATCH_SIZE}*.om \
    --processor_path ./sam3_model \
    --dataset_root ./dataset \
    --device "0,1,2,3" \
    --batch_size "${BATCH_SIZE}"

参数说明：

--model_path：待验证的 OM 文件路径。
--processor_path：SAM3 processor 和 tokenizer 配置目录。
--dataset_root：sa-co-gold 数据集根目录。
--device：推理使用的 NPU 设备 ID，多个设备用逗号分隔。
--batch_size：OM 转换时使用的 batch size。

计算 cgF1。

python3 batch_eval.py \
    --gt_base ./dataset/gt-annotations \
    --pred_dir ./gold_predictions

参数说明：

--gt_base：sa-co-gold 标注目录。
--pred_dir：sam3_image_infer.py 生成的预测结果目录。

sam3_image_infer.py 会在 gold_predictions 目录下生成各子集预测结果；batch_eval.py 汇总各子集并输出平均 cgF1。

结果展示

纯模型性能

Atlas 800I A2：

芯片型号	模型	Batch Size	性能 (fps)	说明
Ascend910B4	sam3	1	4.3706	纯模型性能
Ascend910B4	sam3	4	4.9664	纯模型性能
Ascend910B4	sam3	8	4.6780	纯模型性能

Atlas 300V PRO/RC 176T：

芯片型号	模型	Batch Size	性能 (fps)(Atlas 300V PRO)	性能 (fps)(RC 176T)	说明
Ascend310P3	sam3	1	1.6882	1.6854	纯模型性能
Ascend310P3	sam3	4	1.6860	1.5989	纯模型性能
Ascend310P3	sam3	8	1.6441	1.5577	纯模型性能

模型推理精度

Atlas 800I A2：

芯片型号	模型	数据集	精度指标	结果
Ascend910B4	sam3	sa-co-gold	cgF1	0.5505

Atlas 300V PRO/RC 176T：

芯片型号	模型	数据集	样本数	Mean IoU	T4 Mean IoU
Atlas 300V PRO	sam3	COCO val2017	14631	0.6898	0.6894
RC 176T	sam3	COCO val2017	14631	0.6898	0.6894