文件最后提交记录最后更新时间
[feat]新增device_utils和affinity,为aclgraph提供硬件信息查询能力和cpu绑核能力 Co-authored-by: zhaokerui<zhaokerui@huawei.com> # message auto-generated for no-merge-commit merge: !175 merge move_aff into dev [feat]新增device_utils和affinity,为aclgraph提供硬件信息查询能力和cpu绑核能力 Created-by: zhaokerui Commit-by: zhaokerui Merged-by: ascend-robot Description: <!-- PR描述模板更新日期:20251225 --> # 合入背景 > Fixes#104 # 修改内容 > 1. 新增affinity.py开放bind_cpus(ratio: float)接口提供绑核能力 > 2. 优化npu_utils模块,把原来在PlatformInfo中支持的接口移动至_NPUNodeInfo,新增visible_device_ids, get_device_info_map, get_pcie_info接口,并把单例类改成私有,须使用get_npu_node_info访问单例。 > 3. 新增get_npu_hbm_info 接口访问_NPUHbmInfo单例。 > 如果是需求或者重构类的PR,需要补充详细设计文档(说明上下游组件关系、时序图、类图、DFX能力等内容)。 # 资料变更 > 不涉及 # 接口变更 > 不涉及 # 测试结果 > aclgraph qwen3,dsv3.2功能验证完成 # CheckList > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x]。 - [x] 代码注释完备 - [x] 正确记录错误日志 - [x] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值;考虑接口的异常场景;调用底层组件接口时,需要进行返回值校验) - [x] 进行了空指针校验 - [x] 若存在资源申请,使用后资源被正确的释放了 - [x] 若涉及多线程场景,考虑了并发场景,不存在死锁问题 - [x] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format),使用clang-format工具格式化代码 - [x] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) | [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!1754 个月前
[Refactor]重构parallel_info_manager,支持通信域懒加载,支持通信域可设置复用 Co-authored-by: stanzzzzz<zonghaoxin@huawei.com> # message auto-generated for no-merge-commit merge: !272 merge parallel_refactor into dev [Refactor]重构parallel_info_manager,支持通信域懒加载,支持通信域可设置复用 Created-by: stanzzzzz Commit-by: stanzzzzz Merged-by: ascend-robot Description: <!-- PR描述模板更新日期:20251225 --> # 合入背景 fix https://gitcode.com/Ascend/MindIE-LLM/issues/158 # 修改内容 支持通信域懒加载 支持通信域可设置复用 # 资料变更 不涉及 # 接口变更 不涉及 # 测试结果 ModelRunner ds v32 DP+EP+MTP: 配置如下: ``` "BackendConfig": { "ModelDeployConfig": { "ModelConfig": [ { "async_scheduler_wait_time": 120, "backendType": "atb", "cpuMemSize": 0, "dp": 2, "engine": { "graph": "python" }, "kv_link_timeout": 1080, "kv_trans_timeout": 10, "modelInstanceType": "Standard", "modelName": "ds_v3.2", "modelWeightPath": "/mnt/share/weights/DeepSeek-V3.2-1201-w8a8/", "moe_ep": 16, "moe_tp": 1, "npuMemSize": 3, "plugin_params": "{\"plugin_type\":\"mtp\",\"num_speculative_tokens\": 2}", "tp": 8, "trustRemoteCode": false, "worldSize": 16 } ], ``` 修改 moe_ep_mc2的buffer_size 为 1024 ``` curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{ > "model": "ds_v3.2", > "messages": [ > {"role": "user", "content": "please tell me the capital of China and Japan?"} > ], alse, "max_tokens": 512 }' http://127.0.0.1:1025/v1/chat/completions> "stream": false, > "ignore_eos": false, > "max_tokens": 512 > }' http://127.0.0.1:1025/v1/chat/completions {"id":"endpoint_common_2","object":"chat.completion","created":1768987538,"model":"ds_v3.2","choices":[{"index":0,"message":{"role":"assistant","content":"The capital of China is Beijing, and the capital of Japan is Tokyo.","tool_calls":[]},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens":16,"completion_tokens_details":{"reasoning_tokens":0},"total_tokens":30,"batch_size":[1,1,1,1,1,1,1,1],"queue_wait_time":[911,1138,438,1059,835,645,721,1318]},"prefill_time":324,"decode_time_arr":[26,26,26,39,39,39,39,40,40,40,40,40,40,39,39]} ``` 精度正常 在需要重用时进行了通信域的重用 : ``` [2026-01-21 17:22:50,160] [282779] [281465095516576] [llm] [INFO] [parallel_info_manager.py-304] : Create new process group key = ((15,), 'hccl', 64, 99) [2026-01-21 17:22:50,163] [282779] [281465095516576] [llm] [INFO] [parallel_info_manager.py-278] : Create None reusable process group [2026-01-21 17:22:50,169] [282801] [281464808403360] [llm] [INFO] [parallel_info_manager.py-294] : reuse process group key = ((0, 1, 2, 3, 4, 5, 6, 7), 'hccl', 128, 99) ... [2026-01-21 17:22:50,188] [282791] [281464411713952] [llm] [INFO] [parallel_info_manager.py-304] : Create new process group key = ((0,), 'hccl', 64, 99) ... [2026-01-21 17:22:50,190] [282791] [281464411713952] [llm] [INFO] [parallel_info_manager.py-304] : Create new process group key = ((14,), 'hccl', 64, 99) [2026-01-21 17:22:50,191] [282791] [281464411713952] [llm] [INFO] [parallel_info_manager.py-304] : Create new process group key = ((15,), 'hccl', 64, 99) [2026-01-21 17:22:50,196] [282791] [281464411713952] [llm] [INFO] [parallel_info_manager.py-278] : Create None reusable process group [2026-01-21 17:22:50,199] [282807] [281464224346528] [llm] [INFO] [parallel_info_manager.py-294] : reuse process group key = ((0, 1, 2, 3, 4, 5, 6, 7), 'hccl', 128, 99) [2026-01-21 17:22:50,199] [282807] [281464224346528] [llm] [INFO] [parallel_info_manager.py-294] : reuse process group key = ((8, 9, 10, 11, 12, 13, 14, 15), 'hccl', 128, 99) ``` 在设置通信域为不重用时直接返回不可重用的通信域: [2026-01-21 17:22:50,231] [282789] [281464735723936] [llm] [INFO] [parallel_info_manager.py-304] : Create new process group key = ((15,), 'hccl', 64, 99) [2026-01-21 17:22:50,235] [282789] [281464735723936] [llm] [INFO] [parallel_info_manager.py-278] : Create None reusable process group [2026-01-21 17:22:50,240] [282813] [281464469713312] [llm] [INFO] [parallel_info_manager.py-304] : Create new process group key = ((0,), 'hccl', 64, 99) qwen3 - 32B 拉起后,精度正常: ``` curl --request POST \ --url http://127.0.0.1:1025/v1/chat/completions \ --header 'Content-Type: application/json' \ --data '{ "model":"qwen", "messages":[{ "role": "system", "content": "以梦里花落知多少作为开头,续写一首七言律诗" }], "chat_template_kwargs":{"enable_thinking":true}, "stream": false, "temperature": 0.95, "max_tokens":2048 }' {"id":"endpoint_common_0","object":"chat.completion","created":1768988236,"model":"qwen","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\n好的,用户让我以“梦里花落知多少”开头,续写一首七言律诗。 [2026-01-21 17:49:40,465] [60265] [281463646515488] [llm] [INFO] [parallel_info_manager.py-308] : Create new process group((0, 1), 'hccl', 128, 98) [2026-01-21 17:49:41,488] [60268] [281464116277536] [llm] [INFO] [parallel_info_manager.py-297] : Reuse process group((0, 1), 'hccl', 128, 98) [2026-01-21 17:49:41,489] [60268] [281464116277536] [llm] [INFO] [parallel_info_manager.py-298] : Return Reusable process group ((0, 1), 'hccl', 128, 98) [2026-01-21 17:49:41,585] [60265] [281463646515488] [llm] [INFO] [parallel_info_manager.py-297] : Reuse process group((0, 1), 'hccl', 128, 98) [2026-01-21 17:49:41,585] [60265] [281463646515488] [llm] [INFO] [parallel_info_manager.py-298] : Return Reusable process group ((0, 1), 'hccl', 128, 98) ``` model runner_exp 配置如下: "dp": 2, "engine": { "graph": "python" }, "kv_link_timeout": 1080, "kv_trans_timeout": 10, "modelInstanceType": "Standard", "modelName": "ds_v3.2", "modelWeightPath": "/mnt/share/weights/DeepSeek-V3.2-1201-w8a8/", "moe_ep": 16, "moe_tp": 1, "npuMemSize": 3, "tp": 8, "trustRemoteCode": false, "worldSize": 16 可以正常拉起,精度正常: curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{ "model": "ds_v3.2", "messages": [ {"role": "user", "content": "please tell me the capital of China and Japan?"} ], "stream": false, "ignore_eos": false, "max_tokens": 64 }' http://127.0.0.1:1025/v1/chat/completions {"id":"endpoint_common_0","object":"chat.completion","created":1769433622,"model":"ds_v3.2","choices":[{"index":0,"message":{"role":"assistant","content":"The capital of China is Beijing, and the capital of Japan is Tokyo.","tool_calls":[]},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens":16,"completion_tokens_details":{"reasoning_tokens":0},"total_tokens":30,"batch_size":[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],"queue_wait_time":[1115,1239,313,1299,835,600,1100,1273,917,388,339,291,704,615,1036,933]},"prefill_time":474,"decode_time_arr":[100,55,56,56,55,56,56,57,54,54,54,56,55,55,55]} # CheckList > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x]。 - [x] 代码注释完备 - [x] 正确记录错误日志 - [x] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值;考虑接口的异常场景;调用底层组件接口时,需要进行返回值校验) - [x] 进行了空指针校验 - [x] 若存在资源申请,使用后资源被正确的释放了 - [x] 若涉及多线程场景,考虑了并发场景,不存在死锁问题 - [x] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format),使用clang-format工具格式化代码 - [ ] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) | [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!2721 个月前
[RFC]: 删除环境变量MINDIE_LLM_FRAMEWORK_BACKEND Co-authored-by: KaiMa<KaiMa_SDU@outlook.com> # message auto-generated for no-merge-commit merge: !837 merge del_framework into dev [RFC]: 删除环境变量MINDIE_LLM_FRAMEWORK_BACKEND Created-by: KaiMa Commit-by: KaiMa Merged-by: ascend-robot Description: <!-- PR描述模板更新日期:20251225 --> # 合入背景 > 请描述为什么要做这个PR内的改动。\ > 如涉及,请关联前序PR或同特性/需求下的其他PR。\ > 如果是修复之前PR引入的问题,请关联引入问题的PR。\ > 注意:Fixes #ISSUE ID会自动关闭issue,如问题部分解决请不要使用Fixes,可以用Fix part of #ISSUE ID替代. Fixes #[432](https://gitcode.com/Ascend/MindIE-LLM/issues/432) # 修改内容 > 请描述修改内容的具体实现,涉及哪些组件之间进行交互,可以用1、2、3、...进行罗列。\ > 如果是需求或者重构类的PR,需要补充详细设计文档(说明上下游组件关系、时序图、类图、DFX能力等内容)。 # 资料变更 > 请确认是否涉及资料变更。如涉及,需要在PR中体现,并简要说明修改内容。如不涉及,需填写“不涉及”。 # 接口变更 > 请确认是否涉及跨代码仓或者客户面可见的接口变更。如涉及,需要详细说明接口以及对应的变更内容,同时需要在资料中体现。如不涉及,需填写“不涉及”。 # 测试结果 > 请说明测试场景,测试方法以及测试结果。\ > 测试用例设计时需考虑硬件、部署方式、功能、性能、精度、显存等维度。 # CheckList > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x]。 - [ ] 代码注释完备 - [ ] 正确记录错误日志 - [ ] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值;考虑接口的异常场景;调用底层组件接口时,需要进行返回值校验) - [ ] 进行了空指针校验 - [ ] 若存在资源申请,使用后资源被正确的释放了 - [ ] 若涉及多线程场景,考虑了并发场景,不存在死锁问题 - [ ] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format),使用clang-format工具格式化代码 - [ ] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) | [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!8371 个月前
fix mergedcolumnlinear with diff quant type Co-authored-by: Katrina-CXY<chenxinyi20@huawei.com> # message auto-generated for no-merge-commit merge: !551 merge aclgraph into dev fix mergedcolumnlinear with diff quant type Created-by: Katrina-CXY Commit-by: cxy-katrina;Katrina-CXY Merged-by: ascend-robot Description: <!-- PR描述模板更新日期:20251225 --> # 合入背景 > 请描述为什么要做这个PR内的改动。\ > 如涉及,请关联前序PR或同特性/需求下的其他PR。\ > 如果是修复之前PR引入的问题,请关联引入问题的PR。\ > 注意:Fixes #ISSUE ID会自动关闭issue,如问题部分解决请不要使用Fixes,可以用Fix part of #ISSUE ID替代. # 修改内容 - Aclgraph场景 MergeColumnLinear模块支持gate和up量化方式不一样 - Atbgraph场景 MergeColumnLinearAdapter模块支持gate和up量化方式不一样 # 资料变更 不涉及 # 接口变更 - 不涉及对外接口 - 仅涉及框架侧修改,模型迁移适配接口不涉及修改 # 测试结果 - 环境配置 ``` source /usr/local/lib/python3.11/site-packages/mindie_llm/set_env.sh export MASTER_IP="127.0.0.1" export MASTER_PORT=7897 # 同一台环境不能重复 export MINDIE_LOG_TO_STDOUT=1 # 可选:日志输出到屏幕 source /usr/local/Ascend/ascend-toolkit/set_env.sh ``` - Aclgraph场景,gate up linear量化方式相同 - 配置方式 服务化配置 backendType设置为torch - 结果 ``` curl 127.0.0.1:10255/generate -d ' > { > "prompt": "My name is Olivier and I", > "max_tokens": 30, > "temperature": 0 > }' {"text":["My name is Olivier and I am a 32-year-old man from France. I am currently living in the UK. I am a software developer by trade, but I am"]} ``` - Aclgraph场景,gate up linear量化方式不同 - 配置方式 服务化配置 backendType设置为torch - 结果 ``` curl 127.0.0.1:10255/generate -d ' > { > "prompt": "My name is Olivier and I", > "max_tokens": 30, > "temperature": 0 > }' {"text":["My name is Olivier and I am a French citizen. I am currently working in the UK and I have a UK bank account. I would like to open a French bank account to"]} ``` - Atbgraph场景,gate up linear量化方式相同 - 配置方式 服务化配置 backendType设置为atb - 结果 ``` curl 127.0.0.1:10255/generate -d ' > { > "prompt": "My name is Olivier and I", > "max_tokens": 30, > "temperature": 0 > }' {"text":["My name is Olivier and I am a 30-year-old man from France. I am currently living in the UK. I am a software developer by profession, but I am"]} ``` - Atbgraph场景,gate up linear量化方式不同 - 配置方式 服务化配置 backendType设置为atb - 结果 ``` curl 127.0.0.1:10255/generate -d ' > { > "prompt": "My name is Olivier and I", > "max_tokens": 30, > "temperature": 0 > }' {"text":["My name is Olivier and I am a French citizen. I am currently in the UK on a Tier 1 (General) visa. I have been here for 3 years and"]} ``` # CheckList > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x]。 - [x] 代码注释完备 - [x] 正确记录错误日志 - [x] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值;考虑接口的异常场景;调用底层组件接口时,需要进行返回值校验) - [x] 进行了空指针校验 - [x] 若存在资源申请,使用后资源被正确的释放了 - [x] 若涉及多线程场景,考虑了并发场景,不存在死锁问题 - [x] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format),使用clang-format工具格式化代码 - [x] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) | [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!5512 个月前
utils:helpers+command_executor_utils+torch_utils+npu_utils Co-authored-by: Dawn952<zhaojunbo13@huawei.com> 4 个月前
dsv32预取实现 Co-authored-by: Dawn952<zhaojunbo13@huawei.com> # message auto-generated for no-merge-commit merge: !542 merge T0002 into dev dsv32预取实现 Created-by: Dawn952 Commit-by: Dawn952 Merged-by: ascend-robot Description: <!-- PR描述模板更新日期:20251225 --> # 合入背景 > deepseekv32性能提升,权重预取特性使能。 > 通过新建预取流实现计算与传输的并行,将需要的权重提前从HBM加载至L2 Cache,使AI-Core保持计算状态。 Fix #278. # 修改内容 > 新增了weight_prefetcher工具类,用于使能并启停预取行为。 > deepseekv32模型默认使能权重预取,预取点为attention的o_proj。 # 资料变更 > “不涉及”。 # 接口变更 > “不涉及”。 # 测试结果 测试了使能权重预取前后的性能: A3单机,单并发,短序列条件下基于profiling的结果分析: ![预取1.png](https://raw.gitcode.com/user-images/assets/8772840/d37f08c6-a98b-42b9-a7cf-d1243a64f8e4/预取1.png '预取1.png') A3单机,16并发,3.5k/0.5k测试: 使能权重预取: ![image.png](https://raw.gitcode.com/user-images/assets/8772840/dc63796d-1969-4d75-9cd6-df61172a7b7b/image.png 'image.png') 关闭权重预取: ![image.png](https://raw.gitcode.com/user-images/assets/8772840/48a49217-846c-457e-a20e-d24f9b05b754/image.png 'image.png') # CheckList > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x]。 - [x] 代码注释完备 - [x] 正确记录错误日志 - [x] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值;考虑接口的异常场景;调用底层组件接口时,需要进行返回值校验) - [x] 进行了空指针校验 - [x] 若存在资源申请,使用后资源被正确的释放了 - [x] 若涉及多线程场景,考虑了并发场景,不存在死锁问题 - [x] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format),使用clang-format工具格式化代码 - [x] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) | [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!5422 个月前