MindIE-LLM/tests/pythontest/cpu/runtime · Ascend/MindIE-LLM - AtomGit

文件	最后提交记录	最后更新时间
config	[feat]新增device_utils和affinity，为aclgraph提供硬件信息查询能力和cpu绑核能力 Co-authored-by: zhaokerui<zhaokerui@huawei.com> # message auto-generated for no-merge-commit merge: !175 merge move_aff into dev [feat]新增device_utils和affinity，为aclgraph提供硬件信息查询能力和cpu绑核能力 Created-by: zhaokerui Commit-by: zhaokerui Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251225 --> # 合入背景 > Fixes#104 # 修改内容 > 1. 新增affinity.py开放bind_cpus(ratio: float)接口提供绑核能力 > 2. 优化npu_utils模块，把原来在PlatformInfo中支持的接口移动至_NPUNodeInfo，新增visible_device_ids， get_device_info_map， get_pcie_info接口，并把单例类改成私有，须使用get_npu_node_info访问单例。 > 3. 新增get_npu_hbm_info 接口访问_NPUHbmInfo单例。 > 如果是需求或者重构类的PR，需要补充详细设计文档（说明上下游组件关系、时序图、类图、DFX能力等内容）。 # 资料变更 > 不涉及 # 接口变更 > 不涉及 # 测试结果 > aclgraph qwen3，dsv3.2功能验证完成 # CheckList > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x]。 - [x] 代码注释完备 - [x] 正确记录错误日志 - [x] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值；考虑接口的异常场景；调用底层组件接口时，需要进行返回值校验) - [x] 进行了空指针校验 - [x] 若存在资源申请，使用后资源被正确的释放了 - [x] 若涉及多线程场景，考虑了并发场景，不存在死锁问题 - [x] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format)，使用clang-format工具格式化代码 - [x] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) \| [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!175	4 个月前
layers	select_moe_comm_type重构 Co-authored-by: Dawn952<zhaojunbo13@huawei.com> # message auto-generated for no-merge-commit merge: !789 merge T0004-bugfix2 into dev select_moe_comm_type重构 Created-by: Dawn952 Commit-by: Dawn952 Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251225 --> # 合入背景 > 当前moe_comm_strategy对于分支的判断不够清晰，每个策略应该专注自己分支场景下的判断；对cp场景的讨论存在错误。 > Fixes #401 # 修改内容 > 请描述修改内容的具体实现，涉及哪些组件之间进行交互，可以用1、2、3、...进行罗列。\ > 如果是需求或者重构类的PR，需要补充详细设计文档（说明上下游组件关系、时序图、类图、DFX能力等内容）。 # 资料变更 > “不涉及”。 # 接口变更 > “不涉及”。 # 测试结果 > 请说明测试场景，测试方法以及测试结果。\ > 测试用例设计时需考虑硬件、部署方式、功能、性能、精度、显存等维度。 # CheckList > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x]。 - [x] 代码注释完备 - [x] 正确记录错误日志 - [x] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值；考虑接口的异常场景；调用底层组件接口时，需要进行返回值校验) - [x] 进行了空指针校验 - [x] 若存在资源申请，使用后资源被正确的释放了 - [x] 若涉及多线程场景，考虑了并发场景，不存在死锁问题 - [x] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format)，使用clang-format工具格式化代码 - [x] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) \| [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!789	1 个月前
lora	multi lora 获取base weight shape新增transpose判断 Co-authored-by: zch777<zhuangchenghao@huawei.com> # message auto-generated for no-merge-commit merge: !564 merge dev_local into dev multi lora 获取base weight shape新增transpose判断 Created-by: zch777 Commit-by: zch777 Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251225 --> # 合入背景 > 请描述为什么要做这个PR内的改动。\ > 如涉及，请关联前序PR或同特性/需求下的其他PR。\ > 如果是修复之前PR引入的问题，请关联引入问题的PR。\ > 注意：`Fixes #ISSUE ID`会自动关闭issue，如问题部分解决请不要使用`Fixes`，可以用`Fix part of #ISSUE ID`替代. A2/A3机器，因未判断transpose，导致lora权重加载报错 Fixes #287 # 修改内容 > 请描述修改内容的具体实现，涉及哪些组件之间进行交互，可以用1、2、3、...进行罗列。\ > 如果是需求或者重构类的PR，需要补充详细设计文档（说明上下游组件关系、时序图、类图、DFX能力等内容）。 lora layer获取base layer权重shape时，增加transpose type判断逻辑，判断权重是否转置。 # 资料变更 > 请确认是否涉及资料变更。如涉及，需要在PR中体现，并简要说明修改内容。如不涉及，需填写“不涉及”。不涉及 # 接口变更 > 请确认是否涉及跨代码仓或者客户面可见的接口变更。如涉及，需要详细说明接口以及对应的变更内容，同时需要在资料中体现。如不涉及，需填写“不涉及”。不涉及 # 测试结果 > 请说明测试场景，测试方法以及测试结果。\ > 测试用例设计时需考虑硬件、部署方式、功能、性能、精度、显存等维度。 A2 qwen2.5 + multilora 精度测试当前dev分支代码+PR修改 ![image.png](https://raw.gitcode.com/user-images/assets/8772840/c50fd4b4-29c8-4603-be74-023d0ea32e87/image.png 'image.png') 2.2.rc1商发版本 ![image.png](https://raw.gitcode.com/user-images/assets/8772840/16e29660-5f2f-4b94-b36f-ca7bbb68a994/image.png 'image.png') 300Iduo qwen2.5 + multilora 精度测试当前dev分支代码+PR修改 ![image.png](https://raw.gitcode.com/user-images/assets/8772840/732086b1-8283-430b-9619-24891d400c62/image.png 'image.png') # CheckList > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x]。 - [x] 代码注释完备 - [x] 正确记录错误日志 - [x] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值；考虑接口的异常场景；调用底层组件接口时，需要进行返回值校验) - [x] 进行了空指针校验 - [x] 若存在资源申请，使用后资源被正确的释放了 - [x] 若涉及多线程场景，考虑了并发场景，不存在死锁问题 - [x] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format)，使用clang-format工具格式化代码 - [x] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) \| [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!564	2 个月前
model_runner	aclgraph 新增OOM异常拦截并转错误码 Co-authored-by: xuchi<xuchicolson@163.com> # message auto-generated for no-merge-commit merge: !1024 merge A00256 into dev aclgraph 新增OOM异常拦截并转错误码 Created-by: martinXuc Commit-by: xuchi Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251225 --> # 合入背景 Fix part of #588 仿照 atb-models 捕获 OOM 异常转故障码的方式，在 aclgraph 代码路径上新增 OOM 异常拦截，使 OOM 场景下能够打出正确的错误码（`MIE05E000006`），便于定位和诊断。 # 修改内容 1. 新增 `utils/decorators/exception_handler.py`：封装 `@exception_handler` 类装饰器，自动 wrap 目标方法并捕获 `torch.OutOfMemoryError`，打错误码日志后重抛为 `RuntimeError` 2. `model_runner_exp.py`：对 `ModelRunnerExp` 应用 `@exception_handler` 装饰器，拦截 `forward()` / `compile()` / `load_weights()` 中的 OOM 3. `error_code.py`：新增 `ACL_GRAPH_OUT_OF_MEMORY = “MIE05E000006”`；按编码前缀对 ErrorCode 枚举重新排序，提升可维护性 # 资料变更不涉及。 # 接口变更不涉及。 # 测试结果 ## 测试用例 01：HCCL OOM 异常 ### 触发方式 ```shell export HCCL_BUFFSIZE=200000 ``` ### 关键异常打屏 ```shell File "/usr/local/lib/python3.11/site-packages/mindie_llm/utils/decorators/exception_handler.py", line 38, in wrapper raise RuntimeError(f"{error_msg}. Error_code: {error_code}") from e RuntimeError: Device out of memory (OOM) reported by PyTorch, but it can possibly triggered by HCCL. Enable logs: export ASCEND_SLOG_PRINT_TO_STDOUT=1, export ASCEND_GLOBAL_LOG_LEVEL=3 to check if there's HCCL error messages. Error_code: MIE05E000006 Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/mindie_llm/utils/decorators/exception_handler.py", line 26, in wrapper return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ ... torch.OutOfMemoryError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnDispatchFFNCombine. Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1. Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging. [ERROR] 2026-05-24-15:45:50 (PID:297309, Device:14, RankID:-1) ERR00100 PTA call acl api failed. [PID: 297309] 2026-05-24-15:45:50.987.747 Memory_Allocation_Failure(EL0004): Failed to allocate memory requested by HCCL module. Possible Cause: Available memory is insufficient. Solution: Close applications not in use. TraceBack (most recent call last): alloc memory failed, runtime result = 207001[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148] Failed to allocate [size:419431448576] bytes of NPU memory. Nnopbase fails to invoke the HcclAllocComResourceByTiling function of the hccl module. ret = 24, comm = 0xfff01a18be90. Check nnopbase::IndvHcclWrapper::GetInstance().HcclAllocComResourceByTiling(commHandle, stream, (op::internal::PtrCastTo<NnopbaseTilingData>(executor->args->tilingInfo.tilingData))->GetData(), &contextAddr) failed Check NnopbaseGetHcomResource(executor, stream) failed Check NnopbaseExecutorGetMc2Num(executor, stream, &argsAddr, &mc2Num) failed Check NnopbaseExecutorPrepareParamsExt(executor, stream) failed Check NnopbaseRunWithWorkspace(executor, stream, workspace, workspaceSize) failed ``` ## 测试用例 02：PTA OOM 异常 ### 触发方式 ```shell vim /usr/local/lib/python3.11/site-packages/mindie_llm/runtime/model_runner/model_runner_exp.py ``` ```python Returns: Logits tensor, or tuple of (logits, hidden_states) if speculative tokens enabled. """ # temp trigger OOM torch.zeros(1024 1024 * 1024 * 4, dtype=torch.float32, device=self.device) if self._kv_cache_info.check_diff(kv_caches): self._bind_kv_cache(kv_caches) ``` ### 关键异常日志 ```shell [2026-05-24 16:41:28,672] [334601] [281462080598432] [llm] [ERROR] [plugin_manager.py-299] : Error encountered in generate_token (trace_ids=[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]). trigger recovery or terminate inference thread. Error: Device out of memory (OOM) reported by PyTorch, but it can possibly triggered by HCCL. Enable logs: export ASCEND_SLOG_PRINT_TO_STDOUT=1, export ASCEND_GLOBAL_LOG_LEVEL=3 to check if there's HCCL error messages. Error_code: MIE05E000006 Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/mindie_llm/utils/decorators/exception_handler.py", line 26, in wrapper return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/mindie_llm/runtime/model_runner/model_runner_exp.py", line 298, in forward torch.zeros(1024 1024 * 1024 * 4, dtype=torch.float32, device=self.device) torch.OutOfMemoryError: NPU out of memory. Tried to allocate 16.00 GiB (NPU 1; 61.28 GiB total capacity; 48.03 GiB already allocated; 48.03 GiB current active; 12.99 GiB free; 48.12 GiB reserved in total by PyTorch).If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. The above exception was the direct cause of the following exception: ... RuntimeError: Device out of memory (OOM) reported by PyTorch, but it can possibly triggered by HCCL. Enable logs: export ASCEND_SLOG_PRINT_TO_STDOUT=1, export ASCEND_GLOBAL_LOG_LEVEL=3 to check if there's HCCL error messages. Error_code: MIE05E000006 Traceback (most recent call last): ... ``` # CheckList > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x]。 - [x] 代码注释完备 - [x] 正确记录错误日志 - [x] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值；考虑接口的异常场景；调用底层组件接口时，需要进行返回值校验) - [x] 进行了空指针校验 - [x] 若存在资源申请，使用后资源被正确的释放了 - [x] 若涉及多线程场景，考虑了并发场景，不存在死锁问题 - [x] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format)，使用clang-format工具格式化代码 - [x] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) \| [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!1024	6 天前
models	dsv32 fc bugfix: 1)修复非流式精度问题 2) 修复rotQuant权重下bfcl精度问题 Co-authored-by: xsxhw<xiaoshixiang2@huawei.com> # message auto-generated for no-merge-commit merge: !825 merge dsv32fc into dev dsv32 fc bugfix: 1)修复非流式精度问题 2) 修复rotQuant权重下bfcl精度问题 Created-by: xsxhw1 Commit-by: xsxhw Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251225 --> # 合入背景 > 请描述为什么要做这个PR内的改动。\ > 如涉及，请关联前序PR或同特性/需求下的其他PR。\ > 如果是修复之前PR引入的问题，请关联引入问题的PR。\ > 注意：`Fixes #ISSUE ID`会自动关闭issue，如问题部分解决请不要使用`Fixes`，可以用`Fix part of #ISSUE ID`替代. # 修改内容 > 请描述修改内容的具体实现，涉及哪些组件之间进行交互，可以用1、2、3、...进行罗列。\ > 如果是需求或者重构类的PR，需要补充详细设计文档（说明上下游组件关系、时序图、类图、DFX能力等内容）。 # 资料变更 > 请确认是否涉及资料变更。如涉及，需要在PR中体现，并简要说明修改内容。如不涉及，需填写“不涉及”。 # 接口变更 > 请确认是否涉及跨代码仓或者客户面可见的接口变更。如涉及，需要详细说明接口以及对应的变更内容，同时需要在资料中体现。如不涉及，需填写“不涉及”。 # 测试结果 DeepSeek-V3.2-w8a8-mtp-QuaRot，bfcl simple数据集：非流式： The markdown format results is as below: \| dataset \| version \| metric \| mode \| vllm-api-function-call-chat \| \|----- \| ----- \| ----- \| ----- \| -----\| \| BFCL-v3-simple \| c1b9d1 \| accuracy \| bfcl_v3 \| 0.94 (378.0/400.0) \| 流式： \| dataset \| version \| metric \| mode \| vllm-api-function-call-chat \| \|----- \| ----- \| ----- \| ----- \| -----\| \| BFCL-v3-simple \| c1b9d1 \| accuracy \| bfcl_v3 \| 0.94 (378.0/400.0) \| # CheckList > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x]。 - [ ] 代码注释完备 - [ ] 正确记录错误日志 - [ ] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值；考虑接口的异常场景；调用底层组件接口时，需要进行返回值校验) - [ ] 进行了空指针校验 - [ ] 若存在资源申请，使用后资源被正确的释放了 - [ ] 若涉及多线程场景，考虑了并发场景，不存在死锁问题 - [ ] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format)，使用clang-format工具格式化代码 - [ ] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) \| [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!825	1 个月前
ops	MindIE-LLM支持同时打包多个不同类型算子包&运行时动态选择导入 Co-authored-by: 周天扬<zhoutianyang@huawei.com> # message auto-generated for no-merge-commit merge: !709 merge dev_mie_ops into dev MindIE-LLM支持同时打包多个不同类型算子包&运行时动态选择导入 Created-by: hw-zhoutianyang Commit-by: 周天扬 Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251225 --> # 合入背景 > 请描述为什么要做这个PR内的改动。\ > 如涉及，请关联前序PR或同特性/需求下的其他PR。\ > 如果是修复之前PR引入的问题，请关联引入问题的PR。\ > 注意：`Fixes #ISSUE ID`会自动关闭issue，如问题部分解决请不要使用`Fixes`，可以用`Fix part of #ISSUE ID`替代. Fixes [#361](https://gitcode.com/Ascend/MindIE-LLM/issues/361) # 修改内容 > 请描述修改内容的具体实现，涉及哪些组件之间进行交互，可以用1、2、3、...进行罗列。\ > 如果是需求或者重构类的PR，需要补充详细设计文档（说明上下游组件关系、时序图、类图、DFX能力等内容）。 MindIE-LLM支持同时打包多个不同类型算子包&运行时动态选择导入 # 资料变更 > 请确认是否涉及资料变更。如涉及，需要在PR中体现，并简要说明修改内容。如不涉及，需填写“不涉及”。不涉及 # 接口变更 > 请确认是否涉及跨代码仓或者客户面可见的接口变更。如涉及，需要详细说明接口以及对应的变更内容，同时需要在资料中体现。如不涉及，需填写“不涉及”。不涉及 # 测试结果 > 请说明测试场景，测试方法以及测试结果。\ > 测试用例设计时需考虑硬件、部署方式、功能、性能、精度、显存等维度。 A2大EP服务拉起成功，请求推理成功： ![image.png](https://raw.gitcode.com/user-images/assets/8772840/ca95ba3b-b003-448b-a3e7-5fa69723f472/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8772840/af6ba270-feab-49a7-ba11-c21839a8bb54/image.png 'image.png') A3双机服务拉起成功，gsm8k精度OK： ![image.png](https://raw.gitcode.com/user-images/assets/8772840/b2b315a0-49f1-4a58-bec3-4b0873b86785/image.png 'image.png') # CheckList > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x]。 - [x] 代码注释完备 - [x] 正确记录错误日志 - [x] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值；考虑接口的异常场景；调用底层组件接口时，需要进行返回值校验) - [x] 进行了空指针校验 - [x] 若存在资源申请，使用后资源被正确的释放了 - [x] 若涉及多线程场景，考虑了并发场景，不存在死锁问题 - [x] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format)，使用clang-format工具格式化代码 - [x] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) \| [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!709	2 个月前
tokenizer	[新需求]新增aclgraph model base + qwen3 部分内容 + tokenizer wrapper、json completor Co-authored-by: stanzzzzz<zonghaoxin@huawei.com> # message auto-generated for no-merge-commit merge: !178 merge 0108aclgraphTodev into dev [新需求]新增aclgraph model base + qwen3 部分内容 + tokenizer wrapper、json completor Created-by: stanzzzzz Commit-by: stanzzzzz Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251225 --> # 合入背景 Fix part of https://gitcode.com/Ascend/MindIE-LLM/issues/103 # 修改内容 1. model input builder 基类实现和qwen3 子类实现，用于构建符合模型要求的输入格式。 2. model reasoning parser 基类实现和qwen3 子类实现，处理推理内容。 3. model tool calls processor 基类实现和qwen3 子类实现，处理工具调用。 4. tokenizer wrapper 封装类实现，提供输入编码和输出解码接口。 5. json 补全器 json completor 工具类实现 # 资料变更不涉及 # 接口变更不涉及 # 测试结果 deepseek v32 ,可正常拉起服务化，发送请求正常返回，精度正常： ``` curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{ "model": "ds_v3.2", "messages": [ {"role": "user", "content": "你是谁？"} ], "stream": false, "ignore_eos": false, "max_tokens": 64 }' http://127.0.0.1:10010/v1/chat/completions {"id":"endpoint_common_0","object":"chat.completion","created":1767838406,"model":"ds_v3.2","choices":[{"index":0,"message":{"role":"assistant","content":"我是DeepSeek，由深度求索公司创造的AI助手。很高兴为你解答问题，提供帮助！😊","tool_calls":[]},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":6,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens":25,"completion_tokens_details":{"reasoning_tokens":0},"total_tokens":31,"batch_size":[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],"queue_wait_time":[536,682,422,1371,368,257,1247,781,830,815,859,1228,1592,670,433]},"prefill_time":2153,"decode_time_arr":[291,291,140,140,140,213,213,420,210,210,208,208,220,220,451,478,445,460,155,155,155,493,232,232]} ``` qwen3 32B 可以正常拉起服务化，发送请求可以正常返回，精度正常 ``` curl --request POST --url http://127.0.0.1:1025/v1/chat/completions --header 'Content-Type: application/json' --data '{ "model":"qwen", "messages":[{ "role": "system", "content": "以梦里花落知多少作为开头，续写一首七言律诗" }], "chat_template_kwargs":{"enable_thinking":true}, "stream": false, "temperature": 0.95, "max_tokens":2048 }' {"id":"endpoint_common_0","object":"chat.completion","created":1767797790,"model":"qwen","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\n好的，用户让我以“梦里花落知多少”开头续写一首七言律诗。首先，我需要确认七言律诗的格式要求。七律通常有八句，每句七个字，讲究平仄对仗，中间两联需要对仗工整，押韵一般用平声韵。\n\n接下来，分析原句“梦里花落知多少”。这句诗带有淡淡的哀愁和回忆的感觉，可能涉及离别、时光流逝的主题。我需要延续这种意境，同时展开后续的内容。\n\n首先确定押韵的韵脚。原句的“少”在这里需要注意，因为“少”在平水韵中属于小韵，可能需要调整。不过用户可能更在意现代汉语的押韵，所以可能需要选择“少”对应的韵母，比如“ao”韵，但七律通常要求一韵到底，可能需要调整用词。或者可能用户没有严格遵循古韵，可以适当放宽。\n\n接下来考虑内容的发展。第一句是梦境中的花落，可能引出回忆或者对过去的感慨。中间两联需要对仗，比如第二联和第三联。比如可以写现实中的景象与梦中的对比，或者时间的流逝带来的变化。\n\n比如第二联可以写现实中的景物，比如柳絮、燕呢喃，与梦中的花落形成对比。第三联可以转到更深层次的情感，比如离别后的孤独，或者岁月的变迁。\n\n最后两句需要收束全诗，可能表达一种无奈或者希望。比如问碧海青天，或者寻找答案。\n\n然后检查对仗是否工整，平仄是否符合要求。可能需要调整用词，确保每联的对仗工整，比如“柳线摇风”对“燕声剪水”，“离舟”对“旧信”，“云外”对“灯前”。\n\n最后检查押韵是否一致，通常七律押平声韵，所以“少、娇、娇、聊、天”需要确认是否在同一韵部。可能需要调整最后一个韵脚，使其一致。比如“天”是否和之前的韵脚押韵，可能需要换成“遥”或者其他字。\n\n可能还需要润色诗句，使意境更连贯，情感更统一。例如，确保每句之间有逻辑联系，从梦到现实，再到情感的抒发，最后以问句或感叹结尾，增强余韵。\n</think>\n\n《七律","tool_calls":[]},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":25,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens":512,"completion_tokens_details":{"reasoning_tokens":507},"total_tokens":537,"batch_size":[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],"queue_wait_time":[5441,401,774,1138,978,255,307,252,227,355,368,297,250,308,370,356,436,327,822,337,792,902,686,304,340,406,354,328,869,389,1066,302,102,149,291,346,269,131,585,1088,175,162,279,377,302,334,185,140,659,1007,933,847,255,246,730,819,513,341,252,304,341,454,111,362,359,536,1063,940,763,195,70,76,205,1073,1098,130,1062,242,147,136,814,560,82,945,638,1082,328,718,669,412,177,776,789,1073,1060,423,494,381,1048,165,666,629,175,1001,373,1100,267,167,1011,89,152,1103,1097,224,250,91,934,298,217,493,1108,149,643,1155,912,133,192,1040,476,341,773,1074,1080,984,581,918,866,878,1108,1095,525,951,903,170,124,79,83,1070,697,282,758,307,1150,249,1027,451,652,479,790,601,672,446,126,119,139,96,935,744,590,207,781,445,871,134,152,93,608,928,469,476,1014,710,1021,421,902,664,876,1000,89,195,180,166,1103,86,1070,130,644,152,49,76,1028,127,1087,155,132,88,1022,917,190,165,125,1096,764,1102,169,86,803,330,131,139,70,115,958,743,53,116,101,110,83,85,84,103,1074,185,53,125,69,142,81,113,76,990,70,833,761,1094,1026,714,62,993,843,660,498,71,1099,69,74,1123,978,72,123,125,1038,908,124,1006,899,173,154,113,131,135,1088,134,196,190,134,143,1081,1007,160,189,92,1072,1049,1012,102,841,841,163,148,148,211,231,141,753,592,99,251,194,182,210,132,742,368,154,1056,905,153,142,962,746,513,1076,784,801,225,66,1050,1015,973,1108,1032,1037,168,182,162,134,81,164,817,553,328,730,829,833,1026,865,577,924,999,804,942,1071,957,953,986,1071,57,128,496,964,960,995,957,1067,984,1067,828,1019,1026,1027,760,653,516,655,491,878,823,830,1006,508,824,812,148,831,555,935,836,647,377,804,911,602,276,868,724,862,838,883,1016,994,937,881,872,1102,1107,188,405,884,826,1029,1035,127,187,164,1106,1097,70,132,96,159,1084,1043,865,153,119,1135,668,446,1061,521,737,952,786,494,976,167,388,920,896,591,723,1002,684,441,898,766,923,1015,929,902,667,325,907,876,694,783,939,919,878,549,933,843,874,899,665,85,163,673,473,557,749,686,825,858,721,646,697,942,612,426,619,779,571,325,186,235,712,404,707,171,1134,444,225,146,227,255,368,909,670,576,1007,179,1034,769,953,252,411,207,960,1091,896,986,406,749,723,1015,262,680,310,284]},"prefill_time":166,"decode_time_arr":[56,37,38,37,36,36,36,36,36,37,36,36,36,36,36,37,37,38,37,37,38,37,36,36,36,36,37,38,37,37,37,36,36,36,36,37,37,37,37,36,36,36,37,36,37,38,38,37,38,38,37,36,37,38,38,37,37,36,36,37,37,36,37,37,37,39,38,37,38,36,36,37,37,37,37,38,37,36,37,38,37,37,38,38,39,37,38,38,38,37,37,38,39,38,37,38,38,38,37,39,38,37,38,37,37,37,37,37,36,37,37,37,36,37,37,37,37,37,37,37,37,38,38,37,36,37,38,38,37,37,38,38,38,37,38,38,37,38,38,37,38,39,37,37,36,37,38,39,37,38,38,39,38,39,38,39,40,38,38,39,38,37,36,37,40,38,38,37,37,38,37,37,36,36,37,38,38,38,37,38,38,38,37,38,37,37,37,36,36,36,37,37,37,37,37,37,37,36,37,37,37,37,36,36,37,38,37,36,36,37,38,37,37,37,37,38,37,36,37,36,37,38,37,36,36,36,36,36,36,36,37,38,37,36,36,36,36,36,36,37,37,37,38,37,38,38,37,37,38,38,38,37,36,37,36,36,38,37,36,36,37,38,37,37,38,37,36,36,36,36,37,37,36,36,36,36,37,38,37,36,37,37,38,38,37,37,38,37,36,36,36,36,37,37,38,37,36,36,36,36,37,37,38,37,37,38,37,36,37,38,38,37,38,37,37,37,37,38,38,37,38,37,37,36,36,36,36,36,37,38,38,38,38,39,38,39,39,38,38,39,38,38,39,39,39,38,39,37,37,37,38,37,38,37,38,37,38,37,38,37,38,39,39,38,39,38,39,40,39,39,38,39,39,38,38,37,38,38,38,37,37,38,39,37,38,37,38,37,37,38,38,38,38,38,38,38,37,37,38,37,38,37,36,36,37,38,37,36,36,36,37,38,38,37,36,37,38,38,38,40,38,38,39,39,39,39,37,37,38,38,37,37,38,38,37,38,37,37,38,38,38,38,37,38,38,37,37,38,38,38,37,38,37,37,38,38,37,37,38,37,37,38,37,37,38,38,37,37,39,38,37,37,38,38,38,37,37,39,39,38,39,39,3 ``` # CheckList > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x]。 - [x] 代码注释完备 - [x] 正确记录错误日志 - [x] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值；考虑接口的异常场景；调用底层组件接口时，需要进行返回值校验) - [x] 进行了空指针校验 - [x] 若存在资源申请，使用后资源被正确的释放了 - [x] 若涉及多线程场景，考虑了并发场景，不存在死锁问题 - [ ] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format)，使用clang-format工具格式化代码 - [ ] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) \| [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!178	4 个月前
utils	[RFC]: 删除环境变量MINDIE_LLM_FRAMEWORK_BACKEND Co-authored-by: KaiMa<KaiMa_SDU@outlook.com> # message auto-generated for no-merge-commit merge: !837 merge del_framework into dev [RFC]: 删除环境变量MINDIE_LLM_FRAMEWORK_BACKEND Created-by: KaiMa Commit-by: KaiMa Merged-by: ascend-robot Description: <!-- PR描述模板更新日期：20251225 --> # 合入背景 > 请描述为什么要做这个PR内的改动。\ > 如涉及，请关联前序PR或同特性/需求下的其他PR。\ > 如果是修复之前PR引入的问题，请关联引入问题的PR。\ > 注意：`Fixes #ISSUE ID`会自动关闭issue，如问题部分解决请不要使用`Fixes`，可以用`Fix part of #ISSUE ID`替代. Fixes #[432](https://gitcode.com/Ascend/MindIE-LLM/issues/432) # 修改内容 > 请描述修改内容的具体实现，涉及哪些组件之间进行交互，可以用1、2、3、...进行罗列。\ > 如果是需求或者重构类的PR，需要补充详细设计文档（说明上下游组件关系、时序图、类图、DFX能力等内容）。 # 资料变更 > 请确认是否涉及资料变更。如涉及，需要在PR中体现，并简要说明修改内容。如不涉及，需填写“不涉及”。 # 接口变更 > 请确认是否涉及跨代码仓或者客户面可见的接口变更。如涉及，需要详细说明接口以及对应的变更内容，同时需要在资料中体现。如不涉及，需填写“不涉及”。 # 测试结果 > 请说明测试场景，测试方法以及测试结果。\ > 测试用例设计时需考虑硬件、部署方式、功能、性能、精度、显存等维度。 # CheckList > PR提交人对以下CheckList自检项进行全量自检，自检通过或不涉及，均修改 [ ] 为 [x]。 - [ ] 代码注释完备 - [ ] 正确记录错误日志 - [ ] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值；考虑接口的异常场景；调用底层组件接口时，需要进行返回值校验) - [ ] 进行了空指针校验 - [ ] 若存在资源申请，使用后资源被正确的释放了 - [ ] 若涉及多线程场景，考虑了并发场景，不存在死锁问题 - [ ] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format)，使用clang-format工具格式化代码 - [ ] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) \| [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!837	1 个月前