文件最后提交记录最后更新时间
docs(pytorch): doc fix error Co-authored-by: LQ1206<liuqian164@h-partners.com> # message auto-generated for no-merge-commit merge: !4402 merge master into master docs(pytorch): doc fix error Created-by: LQ1206 Commit-by: LQ1206 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!44021 个月前
fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!43711 个月前
README.md

Mindspore后端提供GLM4.5系列模型支持

模型 下载链接 序列 实现 集群 是否支持
GLM4.5 106B 4K Mcore 8x16

MindSpore后端跑通GLM4.5模型教程

环境配置

MindSpeed-LLM MindSpore后端的安装步骤参考MindSpeed LLM安装指导

训练

预训练

预训练使用方法如下:

cd MindSpeed-LLM
bash examples/mindspore/glm45-moe/pretrain_glm45_moe_106b_4k_A3_ms.sh

用户需要根据实际情况修改脚本中的以下变量:

变量名 含义
MASTER_ADDR 多机情况下主节点IP
NODE_RANK 多机下,各机对应节点序号
CKPT_SAVE_DIR 训练中权重保存路径
DATA_PATH 数据预处理后的数据路径
TOKENIZER_PATH GLM4.5 tokenizer目录
CKPT_LOAD_DIR 权重转换保存的权重路径,用于初始权重加载,如无初始权重则随机初始化