文件最后提交记录最后更新时间
【修改说明】【ClusterD】Ascend950进程级重调度重构,通过判断被调度的pod向TaskD报告故障进程 Co-authored-by: east-yuan<yuanzhendong2@h-partners.com> 3 个月前
【clusterd】人工隔离芯片准确性增强-已隔离故障的应用层部分 Co-authored-by: whr_666<772343610@qq.com> 3 个月前
【clusterd】对于mindie任务,不配置默认过滤故障码 Co-authored-by: zhoupan39<zhoupan39@huawei.com> 1 个月前
[DP & ClusterD]DPU cm refactor Co-authored-by: x00953810<xujingru5@huawei.com> 4 个月前
[DP & ClusterD]DPU cm refactor Co-authored-by: x00953810<xujingru5@huawei.com> 4 个月前
支持通过schedule-policy配置和服务器解耦的调度策略 Co-authored-by: lirui2381<2396601465@qq.com> 5 个月前
【MindCluster】 Atlas 350 标卡适配产品形态和芯片改名 Co-authored-by: q00951730<quyitong@huawei.com> Co-authored-by: cqchou<thekonka@proton.me> Co-authored-by: leon_xun<xunzeliang@h-partners.com> 3 个月前
【clusterd】对于mindie任务,不配置默认过滤故障码 Co-authored-by: zhoupan39<zhoupan39@huawei.com> 1 个月前
<fix>[clusterD]任务失败pg被删除时正确获取异常信息 Co-authored-by: shepherd-cheung<1220798123@qq.com> 2 个月前
【MindCluster】文件名批量修改 Co-authored-by: q00951730<quyitong@huawei.com> 2 个月前
!922 【修改说明】【ClusterD】慢网络clusterd pr part1 Merge pull request !922 from tiankaijin/slowpr-clusterpr1 1 年前
【clusterD】fault job info cm资源更新优化 Co-authored-by: zhoupan39<zhoupan39@huawei.com> 2 个月前
[clusterd]初始化job statistics时过滤无效任务 Co-authored-by: lijinghan<lijinghan1@huawei.com> 2 个月前
【修改说明】【clusterd】【taskd】ModifyTrainingDataTraceSwitch增加cm文件挂载判断,taskd补充对worker数量的检查触发pullMsg周期执行 Co-authored-by: higher_speeder<wangjun940510@qq.com> 5 个月前
【修改说明 Modification】clusterd统一预隔离故障处理,灵衢亚健康故障不按预隔离故障处理 Co-authored-by: wangjun<wangjun940510@qq.com> Co-authored-by: wangjun<374719709@qq.com> 6 个月前
[clusterd]初始化job statistics时过滤无效任务 Co-authored-by: lijinghan<lijinghan1@huawei.com> 2 个月前
【MindCluster】文件名批量修改 Co-authored-by: q00951730<quyitong@huawei.com> 2 个月前
[DP & ClusterD]DPU cm refactor Co-authored-by: x00953810<xujingru5@huawei.com> 4 个月前