文件最后提交记录最后更新时间
[pytorch][feature][mindcluster] Integration of elastic-training-related callback code Co-authored-by: 李鸣沼<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3661 merge elastic_training into master [pytorch][feature][mindcluster] Integration of elastic-training-related callback code Created-by: lmztju Commit-by: lmztju;李鸣沼 Merged-by: ascend-robot Description: move takd elastic-training callback to MindSpeed-LLM See merge request: Ascend/MindSpeed-LLM!36616 个月前
[pytorch][feature][mindcluster] Integration of elastic-training-related callback code Co-authored-by: 李鸣沼<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3661 merge elastic_training into master [pytorch][feature][mindcluster] Integration of elastic-training-related callback code Created-by: lmztju Commit-by: lmztju;李鸣沼 Merged-by: ascend-robot Description: move takd elastic-training callback to MindSpeed-LLM See merge request: Ascend/MindSpeed-LLM!36616 个月前
[pytorch][feature][mindcluster] Integration of elastic-training-related callback code Co-authored-by: 李鸣沼<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3661 merge elastic_training into master [pytorch][feature][mindcluster] Integration of elastic-training-related callback code Created-by: lmztju Commit-by: lmztju;李鸣沼 Merged-by: ascend-robot Description: move takd elastic-training callback to MindSpeed-LLM See merge request: Ascend/MindSpeed-LLM!36616 个月前
[mindio][refactor] modify tft train entry style Co-authored-by: w00576739<wuliangwen@huawei.com> Co-authored-by: KevinK<kekevinson@outlook.com> Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> Co-authored-by: lmztju<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3992 merge master into master [mindio][refactor] modify tft train entry style Created-by: kevinlw Commit-by: wangguoyan;kevinlw;lmztju;w00576739;KevinK Merged-by: ascend-robot Description: [mindio][refactor] modify tft train entry style See merge request: Ascend/MindSpeed-LLM!39924 个月前
[mindio][refactor] modify tft train entry style Co-authored-by: w00576739<wuliangwen@huawei.com> Co-authored-by: KevinK<kekevinson@outlook.com> Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> Co-authored-by: lmztju<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3992 merge master into master [mindio][refactor] modify tft train entry style Created-by: kevinlw Commit-by: wangguoyan;kevinlw;lmztju;w00576739;KevinK Merged-by: ascend-robot Description: [mindio][refactor] modify tft train entry style See merge request: Ascend/MindSpeed-LLM!39924 个月前
[mindio][refactor] modify tft train entry style Co-authored-by: w00576739<wuliangwen@huawei.com> Co-authored-by: KevinK<kekevinson@outlook.com> Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> Co-authored-by: lmztju<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3992 merge master into master [mindio][refactor] modify tft train entry style Created-by: kevinlw Commit-by: wangguoyan;kevinlw;lmztju;w00576739;KevinK Merged-by: ascend-robot Description: [mindio][refactor] modify tft train entry style See merge request: Ascend/MindSpeed-LLM!39924 个月前
[mindio][refactor] modify tft train entry style Co-authored-by: w00576739<wuliangwen@huawei.com> Co-authored-by: KevinK<kekevinson@outlook.com> Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> Co-authored-by: lmztju<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3992 merge master into master [mindio][refactor] modify tft train entry style Created-by: kevinlw Commit-by: wangguoyan;kevinlw;lmztju;w00576739;KevinK Merged-by: ascend-robot Description: [mindio][refactor] modify tft train entry style See merge request: Ascend/MindSpeed-LLM!39924 个月前
[pytorch][mindio][feature]Ensure that the ACP Level 1 asynchronous save feature is compatible with TFT online recovery. Co-authored-by: z30027952<zengyihang2@h-partners.com> # message auto-generated for no-merge-commit merge: !4103 merge acp_tft_compatibility into master [pytorch][mindio][feature]Ensure that the ACP Level 1 asynchronous save feature is compatible with TFT online recovery. Created-by: zengyihang Commit-by: z30027952 Merged-by: ascend-robot Description: [pytorch][mindio][feature]高可用支持ACP&TFT能力兼容,使训练过程中ACP一级异步保存能力和TFT在线恢复能力同时生效 See merge request: Ascend/MindSpeed-LLM!41033 个月前
fix(pytorch):fix TP_DP and TP_DP_CP group not being rebuilt during ARF Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> # message auto-generated for no-merge-commit merge: !4500 merge bugfix_tp_cp into master fix(pytorch):fix TP_DP and TP_DP_CP group not being rebuilt during ARF Created-by: guoywang Commit-by: wangguoyan Merged-by: ascend-robot Description: ## What this PR does / why we need it? fix TP_DP and TP_DP_CP group not being rebuilt during ARF ## Does this PR introduce any user-facing change? NA ## How was this patch tested? NA See merge request: Ascend/MindSpeed-LLM!450010 天前
[mindio][refactor] modify tft train entry style Co-authored-by: w00576739<wuliangwen@huawei.com> Co-authored-by: KevinK<kekevinson@outlook.com> Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> Co-authored-by: lmztju<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3992 merge master into master [mindio][refactor] modify tft train entry style Created-by: kevinlw Commit-by: wangguoyan;kevinlw;lmztju;w00576739;KevinK Merged-by: ascend-robot Description: [mindio][refactor] modify tft train entry style See merge request: Ascend/MindSpeed-LLM!39924 个月前
[pytorch][mindio][feature]Online recovery after precision error based on the specified number of checkpoint steps is supported. need install latest mindio_ttp. Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> # message auto-generated for no-merge-commit merge: !4071 merge master into master [pytorch][mindio][feature]Online recovery after precision error based on the specified number of checkpoint steps is supported. need install latest mindio_ttp. Created-by: guoywang Commit-by: wangguoyan Merged-by: ascend-robot Description: [pytorch][mindio][feature]高可用支持精度异常后按照指定checkpoint步数在线恢复 See merge request: Ascend/MindSpeed-LLM!40714 个月前
[pytorch][mindio][feature]Online recovery after precision error based on the specified number of checkpoint steps is supported. need install latest mindio_ttp. Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> # message auto-generated for no-merge-commit merge: !4071 merge master into master [pytorch][mindio][feature]Online recovery after precision error based on the specified number of checkpoint steps is supported. need install latest mindio_ttp. Created-by: guoywang Commit-by: wangguoyan Merged-by: ascend-robot Description: [pytorch][mindio][feature]高可用支持精度异常后按照指定checkpoint步数在线恢复 See merge request: Ascend/MindSpeed-LLM!40714 个月前
[pytorch][mindio][feature]Ensure that the ACP Level 1 asynchronous save feature is compatible with TFT online recovery. Co-authored-by: z30027952<zengyihang2@h-partners.com> # message auto-generated for no-merge-commit merge: !4103 merge acp_tft_compatibility into master [pytorch][mindio][feature]Ensure that the ACP Level 1 asynchronous save feature is compatible with TFT online recovery. Created-by: zengyihang Commit-by: z30027952 Merged-by: ascend-robot Description: [pytorch][mindio][feature]高可用支持ACP&TFT能力兼容,使训练过程中ACP一级异步保存能力和TFT在线恢复能力同时生效 See merge request: Ascend/MindSpeed-LLM!41033 个月前
[mindio][refactor] modify tft train entry style Co-authored-by: w00576739<wuliangwen@huawei.com> Co-authored-by: KevinK<kekevinson@outlook.com> Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> Co-authored-by: lmztju<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3992 merge master into master [mindio][refactor] modify tft train entry style Created-by: kevinlw Commit-by: wangguoyan;kevinlw;lmztju;w00576739;KevinK Merged-by: ascend-robot Description: [mindio][refactor] modify tft train entry style See merge request: Ascend/MindSpeed-LLM!39924 个月前
[pytorch][mindio][feature]Ensure that the ACP Level 1 asynchronous save feature is compatible with TFT online recovery. Co-authored-by: z30027952<zengyihang2@h-partners.com> # message auto-generated for no-merge-commit merge: !4103 merge acp_tft_compatibility into master [pytorch][mindio][feature]Ensure that the ACP Level 1 asynchronous save feature is compatible with TFT online recovery. Created-by: zengyihang Commit-by: z30027952 Merged-by: ascend-robot Description: [pytorch][mindio][feature]高可用支持ACP&TFT能力兼容,使训练过程中ACP一级异步保存能力和TFT在线恢复能力同时生效 See merge request: Ascend/MindSpeed-LLM!41033 个月前
[mindio][refactor] modify tft train entry style Co-authored-by: w00576739<wuliangwen@huawei.com> Co-authored-by: KevinK<kekevinson@outlook.com> Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> Co-authored-by: lmztju<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3992 merge master into master [mindio][refactor] modify tft train entry style Created-by: kevinlw Commit-by: wangguoyan;kevinlw;lmztju;w00576739;KevinK Merged-by: ascend-robot Description: [mindio][refactor] modify tft train entry style See merge request: Ascend/MindSpeed-LLM!39924 个月前
[mindio][refactor] modify tft train entry style Co-authored-by: w00576739<wuliangwen@huawei.com> Co-authored-by: KevinK<kekevinson@outlook.com> Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> Co-authored-by: lmztju<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3992 merge master into master [mindio][refactor] modify tft train entry style Created-by: kevinlw Commit-by: wangguoyan;kevinlw;lmztju;w00576739;KevinK Merged-by: ascend-robot Description: [mindio][refactor] modify tft train entry style See merge request: Ascend/MindSpeed-LLM!39924 个月前
[mindio][refactor] modify tft train entry style Co-authored-by: w00576739<wuliangwen@huawei.com> Co-authored-by: KevinK<kekevinson@outlook.com> Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> Co-authored-by: lmztju<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3992 merge master into master [mindio][refactor] modify tft train entry style Created-by: kevinlw Commit-by: wangguoyan;kevinlw;lmztju;w00576739;KevinK Merged-by: ascend-robot Description: [mindio][refactor] modify tft train entry style See merge request: Ascend/MindSpeed-LLM!39924 个月前
[pytorch][mindio][feature]Online recovery after precision error based on the specified number of checkpoint steps is supported. need install latest mindio_ttp. Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> # message auto-generated for no-merge-commit merge: !4071 merge master into master [pytorch][mindio][feature]Online recovery after precision error based on the specified number of checkpoint steps is supported. need install latest mindio_ttp. Created-by: guoywang Commit-by: wangguoyan Merged-by: ascend-robot Description: [pytorch][mindio][feature]高可用支持精度异常后按照指定checkpoint步数在线恢复 See merge request: Ascend/MindSpeed-LLM!40714 个月前
[mindio][refactor] modify tft train entry style Co-authored-by: w00576739<wuliangwen@huawei.com> Co-authored-by: KevinK<kekevinson@outlook.com> Co-authored-by: wangguoyan<wangguoyan6@h-partners.com> Co-authored-by: lmztju<limingzhao3@h-partners.com> # message auto-generated for no-merge-commit merge: !3992 merge master into master [mindio][refactor] modify tft train entry style Created-by: kevinlw Commit-by: wangguoyan;kevinlw;lmztju;w00576739;KevinK Merged-by: ascend-robot Description: [mindio][refactor] modify tft train entry style See merge request: Ascend/MindSpeed-LLM!39924 个月前