文件最后提交记录最后更新时间
Coordinator Metrics增加长序列档位与输入输出token二维统计 Co-authored-by: mindie_yangan<yangan7@h-partners.com> # message auto-generated for no-merge-commit merge: !79 merge dev into dev Coordinator Metrics增加长序列档位与输入输出token二维统计 Created-by: mindie_yangan Commit-by: mindie_yangan;yangan7 Merged-by: ascend-robot Description: ## **1. 合入背景** Fixes [#58](https://gitcode.com/Ascend/MindIE-Motor/issues/58) ## **2. 修改内容** 1、新增Metrics长序列档位细化统计 2、新增Metrics输入输出序列长度对应二维表格 3、以上内容与server同步适配 ## **3. 资料变更** 不涉及 ## **4. 接口变更** 不涉及 ## **5. 测试结果** ![image.png](https://raw.gitcode.com/user-images/assets/8772840/c977d93d-ee34-4211-88b2-fba98c311fd2/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8772840/c130e3d3-4f31-4f69-a3d1-6bd8c33733cb/image.png 'image.png') 截图太大无法放入 ## **6. CheckList** > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x] [x] 代码注释完备 [x] 正确记录维测日志 [x] 是否有UT用例 See merge request: Ascend/MindIE-Motor!793 个月前
MS接收LLM发起的实例级快恢流程,并根据LLM上报的故障码生成告警,上报给CCAE Co-authored-by: lbr711<liuboru1@huawei.com> # message auto-generated for no-merge-commit merge: !97 merge oom into dev MS接收LLM发起的实例级快恢流程,并根据LLM上报的故障码生成告警,上报给CCAE Created-by: lbr711 Commit-by: lbr711 Merged-by: ascend-robot Description: ## **1. 合入背景** MindIE LLM Text Generator触发OutOfMemory(OOM)异常时向控制面抛出,控制面负责触发OOM快恢流程并向CCAE上报事件告警 ## **2. 修改内容** 1. 提供OOM故障码由NodeManager透传至Controller的通信通道 2. 接收OOM故障码时, 触发OOM快恢流程 3. 上报CCAE事件告警 4. OOM快恢流程沿用灵衢快恢, PAUSE_ENGINE->REINIT_NPU->START_ENGINE ## **3. 资料变更** 1. 在配置项"fault_recovery_func_dict"中新增使能OOM快恢的子配置项"oom" 2. 修改NodeManager的轮询间隔为1s ## **4. 接口变更** 不涉及 ## **5. 测试结果** ## **6. CheckList** > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x] [x] 代码注释完备 [x] 正确记录维测日志 [x] 是否有UT用例 See merge request: Ascend/MindIE-Motor!972 个月前
MindIE Motor仓目录调整 Co-authored-by: zhangdiago<zhangdi5@huawei.com> # message auto-generated for no-merge-commit merge: !71 merge personal/z00833806/Motor_Dir0121 into dev MindIE Motor仓目录调整 Created-by: zhangdiago Commit-by: zhangdiago Merged-by: ascend-robot Description: ## **1. 合入背景** 开源社区目录与代码整改方案对齐 [#46](https://gitcode.com/Ascend/MindIE-Motor/issues/46) 1/3 15:30–16:30 (UTC+08:00)Beijing 陈波;何建平;康宇昕;柯展;罗福云;王晓鹏;吕有辉;吴铭泾;王君;张迪;喻军宇;耿力;王洋 通过线上会议+群聊对齐方式达成一致:  nodeManager和om adapter两个python组建,更改目录结构到mindie motor下,与mindie motor一起出一个整包,不再单独出包,原本的mindie service目录废弃; ## **2. 修改内容** 1.mindie_service改为mindie_motor; 2.management_service改为src 3.http_client_ctl挪到src目录; 4.mindie_motor下新增python/mindie_motor目录,node_manager、om_adapter挪到此目录下,另外controller和coordinator的whl包打包代码也放在此目录 ## **3. 资料变更** 资料需要适配新目录 ## **4. 接口变更** 不涉及 ## **5. 测试结果** 打包验证OK: ![image.png](https://raw.gitcode.com/user-images/assets/8772838/031d7c2f-00f7-4a1f-be86-f064551bc595/image.png 'image.png') 大EP验证: 镜像信息:https://cmc-szv.clouddragon.huawei.com/cmcversion/index/findSnapshotRelease?deltaId=14076751309850496&isSelect=Inner&url_data=MindIE-images 在上述镜像基础上安装上面编译的whl包 ![image.png](https://raw.gitcode.com/user-images/assets/8772838/38b573ba-75e2-483f-bfee-898f6d921ae1/image.png 'image.png') 安装: boot.sh内添加如实适配命令: pip install /mnt/z00833806/mindie_motor-1.0.0-cp311-cp311-linux_aarch64.whl --force-reinstall chmod 500 /usr/local/lib/python3.11/site-packages/mindie_motor/scripts/http_client_ctl/*; chmod 550 /usr/local/lib/python3.11/site-packages/mindie_motor/examples/kubernetes_deploy_scripts/boot_helper/*; chmod 640 /usr/local/lib/python3.11/site-packages/mindie_motor/examples/kubernetes_deploy_scripts/boot_helper/boot.sh; chmod 500 /usr/local/lib/python3.11/site-packages/mindie_motor/scripts/http_client_ctl/*; chmod 700 /root; chmod 640 /usr/local/lib/python3.11/site-packages/mindie_motor/conf/model_config/*.json; chmod 640 /usr/local/lib/python3.11/site-packages/mindie_motor/conf/machine_config/*.json; 拉起OK ![image.png](https://raw.gitcode.com/user-images/assets/8772838/23c0e897-9c56-4440-b578-f38b40c87755/image.png 'image.png') 业务验证curl通了 ![image.png](https://raw.gitcode.com/user-images/assets/8772838/7a921a91-73d7-4861-a3bb-718d65219bb1/image.png 'image.png') ## **6. CheckList** > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x] [x] 代码注释完备 [x] 正确记录维测日志 [x] 是否有UT用例 See merge request: Ascend/MindIE-Motor!714 个月前
MindIE Motor仓目录调整 Co-authored-by: zhangdiago<zhangdi5@huawei.com> # message auto-generated for no-merge-commit merge: !71 merge personal/z00833806/Motor_Dir0121 into dev MindIE Motor仓目录调整 Created-by: zhangdiago Commit-by: zhangdiago Merged-by: ascend-robot Description: ## **1. 合入背景** 开源社区目录与代码整改方案对齐 [#46](https://gitcode.com/Ascend/MindIE-Motor/issues/46) 1/3 15:30–16:30 (UTC+08:00)Beijing 陈波;何建平;康宇昕;柯展;罗福云;王晓鹏;吕有辉;吴铭泾;王君;张迪;喻军宇;耿力;王洋 通过线上会议+群聊对齐方式达成一致:  nodeManager和om adapter两个python组建,更改目录结构到mindie motor下,与mindie motor一起出一个整包,不再单独出包,原本的mindie service目录废弃; ## **2. 修改内容** 1.mindie_service改为mindie_motor; 2.management_service改为src 3.http_client_ctl挪到src目录; 4.mindie_motor下新增python/mindie_motor目录,node_manager、om_adapter挪到此目录下,另外controller和coordinator的whl包打包代码也放在此目录 ## **3. 资料变更** 资料需要适配新目录 ## **4. 接口变更** 不涉及 ## **5. 测试结果** 打包验证OK: ![image.png](https://raw.gitcode.com/user-images/assets/8772838/031d7c2f-00f7-4a1f-be86-f064551bc595/image.png 'image.png') 大EP验证: 镜像信息:https://cmc-szv.clouddragon.huawei.com/cmcversion/index/findSnapshotRelease?deltaId=14076751309850496&isSelect=Inner&url_data=MindIE-images 在上述镜像基础上安装上面编译的whl包 ![image.png](https://raw.gitcode.com/user-images/assets/8772838/38b573ba-75e2-483f-bfee-898f6d921ae1/image.png 'image.png') 安装: boot.sh内添加如实适配命令: pip install /mnt/z00833806/mindie_motor-1.0.0-cp311-cp311-linux_aarch64.whl --force-reinstall chmod 500 /usr/local/lib/python3.11/site-packages/mindie_motor/scripts/http_client_ctl/*; chmod 550 /usr/local/lib/python3.11/site-packages/mindie_motor/examples/kubernetes_deploy_scripts/boot_helper/*; chmod 640 /usr/local/lib/python3.11/site-packages/mindie_motor/examples/kubernetes_deploy_scripts/boot_helper/boot.sh; chmod 500 /usr/local/lib/python3.11/site-packages/mindie_motor/scripts/http_client_ctl/*; chmod 700 /root; chmod 640 /usr/local/lib/python3.11/site-packages/mindie_motor/conf/model_config/*.json; chmod 640 /usr/local/lib/python3.11/site-packages/mindie_motor/conf/machine_config/*.json; 拉起OK ![image.png](https://raw.gitcode.com/user-images/assets/8772838/23c0e897-9c56-4440-b578-f38b40c87755/image.png 'image.png') 业务验证curl通了 ![image.png](https://raw.gitcode.com/user-images/assets/8772838/7a921a91-73d7-4861-a3bb-718d65219bb1/image.png 'image.png') ## **6. CheckList** > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x] [x] 代码注释完备 [x] 正确记录维测日志 [x] 是否有UT用例 See merge request: Ascend/MindIE-Motor!714 个月前
MindIE Motor仓目录调整 Co-authored-by: zhangdiago<zhangdi5@huawei.com> # message auto-generated for no-merge-commit merge: !71 merge personal/z00833806/Motor_Dir0121 into dev MindIE Motor仓目录调整 Created-by: zhangdiago Commit-by: zhangdiago Merged-by: ascend-robot Description: ## **1. 合入背景** 开源社区目录与代码整改方案对齐 [#46](https://gitcode.com/Ascend/MindIE-Motor/issues/46) 1/3 15:30–16:30 (UTC+08:00)Beijing 陈波;何建平;康宇昕;柯展;罗福云;王晓鹏;吕有辉;吴铭泾;王君;张迪;喻军宇;耿力;王洋 通过线上会议+群聊对齐方式达成一致:  nodeManager和om adapter两个python组建,更改目录结构到mindie motor下,与mindie motor一起出一个整包,不再单独出包,原本的mindie service目录废弃; ## **2. 修改内容** 1.mindie_service改为mindie_motor; 2.management_service改为src 3.http_client_ctl挪到src目录; 4.mindie_motor下新增python/mindie_motor目录,node_manager、om_adapter挪到此目录下,另外controller和coordinator的whl包打包代码也放在此目录 ## **3. 资料变更** 资料需要适配新目录 ## **4. 接口变更** 不涉及 ## **5. 测试结果** 打包验证OK: ![image.png](https://raw.gitcode.com/user-images/assets/8772838/031d7c2f-00f7-4a1f-be86-f064551bc595/image.png 'image.png') 大EP验证: 镜像信息:https://cmc-szv.clouddragon.huawei.com/cmcversion/index/findSnapshotRelease?deltaId=14076751309850496&isSelect=Inner&url_data=MindIE-images 在上述镜像基础上安装上面编译的whl包 ![image.png](https://raw.gitcode.com/user-images/assets/8772838/38b573ba-75e2-483f-bfee-898f6d921ae1/image.png 'image.png') 安装: boot.sh内添加如实适配命令: pip install /mnt/z00833806/mindie_motor-1.0.0-cp311-cp311-linux_aarch64.whl --force-reinstall chmod 500 /usr/local/lib/python3.11/site-packages/mindie_motor/scripts/http_client_ctl/*; chmod 550 /usr/local/lib/python3.11/site-packages/mindie_motor/examples/kubernetes_deploy_scripts/boot_helper/*; chmod 640 /usr/local/lib/python3.11/site-packages/mindie_motor/examples/kubernetes_deploy_scripts/boot_helper/boot.sh; chmod 500 /usr/local/lib/python3.11/site-packages/mindie_motor/scripts/http_client_ctl/*; chmod 700 /root; chmod 640 /usr/local/lib/python3.11/site-packages/mindie_motor/conf/model_config/*.json; chmod 640 /usr/local/lib/python3.11/site-packages/mindie_motor/conf/machine_config/*.json; 拉起OK ![image.png](https://raw.gitcode.com/user-images/assets/8772838/23c0e897-9c56-4440-b578-f38b40c87755/image.png 'image.png') 业务验证curl通了 ![image.png](https://raw.gitcode.com/user-images/assets/8772838/7a921a91-73d7-4861-a3bb-718d65219bb1/image.png 'image.png') ## **6. CheckList** > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x] [x] 代码注释完备 [x] 正确记录维测日志 [x] 是否有UT用例 See merge request: Ascend/MindIE-Motor!714 个月前
解决内存流控日志打印问题 3 个月前
MindIE Motor仓目录调整 Co-authored-by: zhangdiago<zhangdi5@huawei.com> # message auto-generated for no-merge-commit merge: !71 merge personal/z00833806/Motor_Dir0121 into dev MindIE Motor仓目录调整 Created-by: zhangdiago Commit-by: zhangdiago Merged-by: ascend-robot Description: ## **1. 合入背景** 开源社区目录与代码整改方案对齐 [#46](https://gitcode.com/Ascend/MindIE-Motor/issues/46) 1/3 15:30–16:30 (UTC+08:00)Beijing 陈波;何建平;康宇昕;柯展;罗福云;王晓鹏;吕有辉;吴铭泾;王君;张迪;喻军宇;耿力;王洋 通过线上会议+群聊对齐方式达成一致:  nodeManager和om adapter两个python组建,更改目录结构到mindie motor下,与mindie motor一起出一个整包,不再单独出包,原本的mindie service目录废弃; ## **2. 修改内容** 1.mindie_service改为mindie_motor; 2.management_service改为src 3.http_client_ctl挪到src目录; 4.mindie_motor下新增python/mindie_motor目录,node_manager、om_adapter挪到此目录下,另外controller和coordinator的whl包打包代码也放在此目录 ## **3. 资料变更** 资料需要适配新目录 ## **4. 接口变更** 不涉及 ## **5. 测试结果** 打包验证OK: ![image.png](https://raw.gitcode.com/user-images/assets/8772838/031d7c2f-00f7-4a1f-be86-f064551bc595/image.png 'image.png') 大EP验证: 镜像信息:https://cmc-szv.clouddragon.huawei.com/cmcversion/index/findSnapshotRelease?deltaId=14076751309850496&isSelect=Inner&url_data=MindIE-images 在上述镜像基础上安装上面编译的whl包 ![image.png](https://raw.gitcode.com/user-images/assets/8772838/38b573ba-75e2-483f-bfee-898f6d921ae1/image.png 'image.png') 安装: boot.sh内添加如实适配命令: pip install /mnt/z00833806/mindie_motor-1.0.0-cp311-cp311-linux_aarch64.whl --force-reinstall chmod 500 /usr/local/lib/python3.11/site-packages/mindie_motor/scripts/http_client_ctl/*; chmod 550 /usr/local/lib/python3.11/site-packages/mindie_motor/examples/kubernetes_deploy_scripts/boot_helper/*; chmod 640 /usr/local/lib/python3.11/site-packages/mindie_motor/examples/kubernetes_deploy_scripts/boot_helper/boot.sh; chmod 500 /usr/local/lib/python3.11/site-packages/mindie_motor/scripts/http_client_ctl/*; chmod 700 /root; chmod 640 /usr/local/lib/python3.11/site-packages/mindie_motor/conf/model_config/*.json; chmod 640 /usr/local/lib/python3.11/site-packages/mindie_motor/conf/machine_config/*.json; 拉起OK ![image.png](https://raw.gitcode.com/user-images/assets/8772838/23c0e897-9c56-4440-b578-f38b40c87755/image.png 'image.png') 业务验证curl通了 ![image.png](https://raw.gitcode.com/user-images/assets/8772838/7a921a91-73d7-4861-a3bb-718d65219bb1/image.png 'image.png') ## **6. CheckList** > PR提交人对以下CheckList自检项进行全量自检,自检通过或不涉及,均修改 [ ] 为 [x] [x] 代码注释完备 [x] 正确记录维测日志 [x] 是否有UT用例 See merge request: Ascend/MindIE-Motor!714 个月前