文件最后提交记录最后更新时间
communication_op.py aclgraph回合dev Co-authored-by: bowenli<libowen82@huawei.com> # message auto-generated for no-merge-commit merge: !207 merge dev_communication_op into dev communication_op.py aclgraph回合dev Created-by: bowenli Commit-by: bowenli Merged-by: ascend-robot Description: <!-- PR描述模板更新日期:20251225 --> # 合入背景 > 请描述为什么要做这个PR内的改动。\ > 如涉及,请关联前序PR或同特性/需求下的其他PR。\ > 如果是修复之前PR引入的问题,请关联引入问题的PR。\ > 注意:Fixes #ISSUE ID会自动关闭issue,如问题部分解决请不要使用Fixes,可以用Fix part of #ISSUE ID替代. sparse_attention所需通信操作 # 修改内容 > 请描述修改内容的具体实现,涉及哪些组件之间进行交互,可以用1、2、3、...进行罗列。\ > 如果是需求或者重构类的PR,需要补充详细设计文档(说明上下游组件关系、时序图、类图、DFX能力等内容)。 communication_op.py # 资料变更 > 请确认是否涉及资料变更。如涉及,需要在PR中体现,并简要说明修改内容。如不涉及,需填写“不涉及”。 不涉及 # 接口变更 > 请确认是否涉及跨代码仓或者客户面可见的接口变更。如涉及,需要详细说明接口以及对应的变更内容,同时需要在资料中体现。如不涉及,需填写“不涉及”。 不涉及 # 测试结果 > 请说明测试场景,测试方法以及测试结果。\ > 测试用例设计时需考虑硬件、部署方式、功能、性能、精度、显存等维度。 ![1.PNG](https://raw.gitcode.com/user-images/assets/8772840/5a64bfcb-312d-42db-af59-a1113ad201e0/1.PNG '1.PNG') # CheckList > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x]。 - [ ] 代码注释完备 - [ ] 正确记录错误日志 - [ ] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值;考虑接口的异常场景;调用底层组件接口时,需要进行返回值校验) - [ ] 进行了空指针校验 - [ ] 若存在资源申请,使用后资源被正确的释放了 - [ ] 若涉及多线程场景,考虑了并发场景,不存在死锁问题 - [ ] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format),使用clang-format工具格式化代码 - [ ] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) | [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!2074 个月前
acl graph distributed Co-authored-by: stanzzzzz<zonghaoxin@huawei.com> 4 个月前
重构cache_pool组件 Co-authored-by: Dawn952<zhaojunbo13@huawei.com> # message auto-generated for no-merge-commit merge: !419 merge T0001 into dev 重构cache_pool组件 Created-by: Dawn952 Commit-by: Dawn952 Merged-by: ascend-robot Description: <!-- PR描述模板更新日期:20251225 --> # 合入背景 > 随着大语言模型架构的快速演进,推理过程中的缓存机制已从传统的单一 KV Cache 演变为多形态异构体系,现有的代码仓内的架构并不能适应模型cache在形状、数量上的变化,急需架构变革。 > Fixes #228 # 修改内容 > 新增ModelCachePool类来管理cache。 > 新的cache pool架构下移,并与模型解耦,不在组件中感知模型具体有几个cache、形状如何,而是从模型侧获取创建tensor所必需的数据并完成空间分配。 # 资料变更 > 不涉及。 # 接口变更 > 不涉及。 # 测试结果 > 在基于新TG的联调分支中完成测试。 # CheckList > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x]。 - [ ] 代码注释完备 - [ ] 正确记录错误日志 - [ ] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值;考虑接口的异常场景;调用底层组件接口时,需要进行返回值校验) - [ ] 进行了空指针校验 - [ ] 若存在资源申请,使用后资源被正确的释放了 - [ ] 若涉及多线程场景,考虑了并发场景,不存在死锁问题 - [ ] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format),使用clang-format工具格式化代码 - [ ] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) | [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!4193 个月前
[Refactor]重构parallel_info_manager,支持通信域懒加载,支持通信域可设置复用 Co-authored-by: stanzzzzz<zonghaoxin@huawei.com> # message auto-generated for no-merge-commit merge: !272 merge parallel_refactor into dev [Refactor]重构parallel_info_manager,支持通信域懒加载,支持通信域可设置复用 Created-by: stanzzzzz Commit-by: stanzzzzz Merged-by: ascend-robot Description: <!-- PR描述模板更新日期:20251225 --> # 合入背景 fix https://gitcode.com/Ascend/MindIE-LLM/issues/158 # 修改内容 支持通信域懒加载 支持通信域可设置复用 # 资料变更 不涉及 # 接口变更 不涉及 # 测试结果 ModelRunner ds v32 DP+EP+MTP: 配置如下: ``` "BackendConfig": { "ModelDeployConfig": { "ModelConfig": [ { "async_scheduler_wait_time": 120, "backendType": "atb", "cpuMemSize": 0, "dp": 2, "engine": { "graph": "python" }, "kv_link_timeout": 1080, "kv_trans_timeout": 10, "modelInstanceType": "Standard", "modelName": "ds_v3.2", "modelWeightPath": "/mnt/share/weights/DeepSeek-V3.2-1201-w8a8/", "moe_ep": 16, "moe_tp": 1, "npuMemSize": 3, "plugin_params": "{\"plugin_type\":\"mtp\",\"num_speculative_tokens\": 2}", "tp": 8, "trustRemoteCode": false, "worldSize": 16 } ], ``` 修改 moe_ep_mc2的buffer_size 为 1024 ``` curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{ > "model": "ds_v3.2", > "messages": [ > {"role": "user", "content": "please tell me the capital of China and Japan?"} > ], alse, "max_tokens": 512 }' http://127.0.0.1:1025/v1/chat/completions> "stream": false, > "ignore_eos": false, > "max_tokens": 512 > }' http://127.0.0.1:1025/v1/chat/completions {"id":"endpoint_common_2","object":"chat.completion","created":1768987538,"model":"ds_v3.2","choices":[{"index":0,"message":{"role":"assistant","content":"The capital of China is Beijing, and the capital of Japan is Tokyo.","tool_calls":[]},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens":16,"completion_tokens_details":{"reasoning_tokens":0},"total_tokens":30,"batch_size":[1,1,1,1,1,1,1,1],"queue_wait_time":[911,1138,438,1059,835,645,721,1318]},"prefill_time":324,"decode_time_arr":[26,26,26,39,39,39,39,40,40,40,40,40,40,39,39]} ``` 精度正常 在需要重用时进行了通信域的重用 : ``` [2026-01-21 17:22:50,160] [282779] [281465095516576] [llm] [INFO] [parallel_info_manager.py-304] : Create new process group key = ((15,), 'hccl', 64, 99) [2026-01-21 17:22:50,163] [282779] [281465095516576] [llm] [INFO] [parallel_info_manager.py-278] : Create None reusable process group [2026-01-21 17:22:50,169] [282801] [281464808403360] [llm] [INFO] [parallel_info_manager.py-294] : reuse process group key = ((0, 1, 2, 3, 4, 5, 6, 7), 'hccl', 128, 99) ... [2026-01-21 17:22:50,188] [282791] [281464411713952] [llm] [INFO] [parallel_info_manager.py-304] : Create new process group key = ((0,), 'hccl', 64, 99) ... [2026-01-21 17:22:50,190] [282791] [281464411713952] [llm] [INFO] [parallel_info_manager.py-304] : Create new process group key = ((14,), 'hccl', 64, 99) [2026-01-21 17:22:50,191] [282791] [281464411713952] [llm] [INFO] [parallel_info_manager.py-304] : Create new process group key = ((15,), 'hccl', 64, 99) [2026-01-21 17:22:50,196] [282791] [281464411713952] [llm] [INFO] [parallel_info_manager.py-278] : Create None reusable process group [2026-01-21 17:22:50,199] [282807] [281464224346528] [llm] [INFO] [parallel_info_manager.py-294] : reuse process group key = ((0, 1, 2, 3, 4, 5, 6, 7), 'hccl', 128, 99) [2026-01-21 17:22:50,199] [282807] [281464224346528] [llm] [INFO] [parallel_info_manager.py-294] : reuse process group key = ((8, 9, 10, 11, 12, 13, 14, 15), 'hccl', 128, 99) ``` 在设置通信域为不重用时直接返回不可重用的通信域: [2026-01-21 17:22:50,231] [282789] [281464735723936] [llm] [INFO] [parallel_info_manager.py-304] : Create new process group key = ((15,), 'hccl', 64, 99) [2026-01-21 17:22:50,235] [282789] [281464735723936] [llm] [INFO] [parallel_info_manager.py-278] : Create None reusable process group [2026-01-21 17:22:50,240] [282813] [281464469713312] [llm] [INFO] [parallel_info_manager.py-304] : Create new process group key = ((0,), 'hccl', 64, 99) qwen3 - 32B 拉起后,精度正常: ``` curl --request POST \ --url http://127.0.0.1:1025/v1/chat/completions \ --header 'Content-Type: application/json' \ --data '{ "model":"qwen", "messages":[{ "role": "system", "content": "以梦里花落知多少作为开头,续写一首七言律诗" }], "chat_template_kwargs":{"enable_thinking":true}, "stream": false, "temperature": 0.95, "max_tokens":2048 }' {"id":"endpoint_common_0","object":"chat.completion","created":1768988236,"model":"qwen","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\n好的,用户让我以“梦里花落知多少”开头,续写一首七言律诗。 [2026-01-21 17:49:40,465] [60265] [281463646515488] [llm] [INFO] [parallel_info_manager.py-308] : Create new process group((0, 1), 'hccl', 128, 98) [2026-01-21 17:49:41,488] [60268] [281464116277536] [llm] [INFO] [parallel_info_manager.py-297] : Reuse process group((0, 1), 'hccl', 128, 98) [2026-01-21 17:49:41,489] [60268] [281464116277536] [llm] [INFO] [parallel_info_manager.py-298] : Return Reusable process group ((0, 1), 'hccl', 128, 98) [2026-01-21 17:49:41,585] [60265] [281463646515488] [llm] [INFO] [parallel_info_manager.py-297] : Reuse process group((0, 1), 'hccl', 128, 98) [2026-01-21 17:49:41,585] [60265] [281463646515488] [llm] [INFO] [parallel_info_manager.py-298] : Return Reusable process group ((0, 1), 'hccl', 128, 98) ``` model runner_exp 配置如下: "dp": 2, "engine": { "graph": "python" }, "kv_link_timeout": 1080, "kv_trans_timeout": 10, "modelInstanceType": "Standard", "modelName": "ds_v3.2", "modelWeightPath": "/mnt/share/weights/DeepSeek-V3.2-1201-w8a8/", "moe_ep": 16, "moe_tp": 1, "npuMemSize": 3, "tp": 8, "trustRemoteCode": false, "worldSize": 16 可以正常拉起,精度正常: curl -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{ "model": "ds_v3.2", "messages": [ {"role": "user", "content": "please tell me the capital of China and Japan?"} ], "stream": false, "ignore_eos": false, "max_tokens": 64 }' http://127.0.0.1:1025/v1/chat/completions {"id":"endpoint_common_0","object":"chat.completion","created":1769433622,"model":"ds_v3.2","choices":[{"index":0,"message":{"role":"assistant","content":"The capital of China is Beijing, and the capital of Japan is Tokyo.","tool_calls":[]},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens":16,"completion_tokens_details":{"reasoning_tokens":0},"total_tokens":30,"batch_size":[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],"queue_wait_time":[1115,1239,313,1299,835,600,1100,1273,917,388,339,291,704,615,1036,933]},"prefill_time":474,"decode_time_arr":[100,55,56,56,55,56,56,57,54,54,54,56,55,55,55]} # CheckList > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x]。 - [x] 代码注释完备 - [x] 正确记录错误日志 - [x] 进行了返回值校验 (禁止使用void屏蔽安全函数、自研函数返回值;考虑接口的异常场景;调用底层组件接口时,需要进行返回值校验) - [x] 进行了空指针校验 - [x] 若存在资源申请,使用后资源被正确的释放了 - [x] 若涉及多线程场景,考虑了并发场景,不存在死锁问题 - [x] 按照[代码仓中提供的格式模板](https://gitcode.com/Ascend/MindIE-LLM/blob/master/.clang-format),使用clang-format工具格式化代码 - [ ] 符合Ascend社区的编码规范。[C++ 语言编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-coding-style-guide.md) | [C++ 语言安全编程指导](https://gitcode.com/Ascend/community/blob/master/docs/contributor/Ascend-cpp-secure-coding-guide.md) See merge request: Ascend/MindIE-LLM!2721 个月前
acl graph distributed Co-authored-by: stanzzzzz<zonghaoxin@huawei.com> 4 个月前