ascend-robotdocs(pytorch): doc fix error

文件	最后提交记录	最后更新时间
README.md	docs(pytorch): doc fix error Co-authored-by: LQ1206<liuqian164@h-partners.com> # message auto-generated for no-merge-commit merge: !4402 merge master into master docs(pytorch): doc fix error Created-by: LQ1206 Commit-by: LQ1206 Merged-by: ascend-robot Description: ## What this PR does / why we need it? Please describe the background and detailed changes of the PR. If it is a bugfix, please attach the related issue. ## Does this PR introduce any user-facing change? Please describe whether the PR will result in any user-facing usage changes. If there is related documentation, please specify its path. ## How was this patch tested? Please explain how to verify the correctness and effectiveness of this feature, as well as its usage constraints and limitations. See merge request: Ascend/MindSpeed-LLM!4402	1 个月前
pretrain_glm45_moe_106b_4k_A3_ms.sh	fix(pytorch):add ckpt-format argument to scripts Co-authored-by: z__y<z4t155664@163.com> # message auto-generated for no-merge-commit merge: !4371 merge add_ckpt_torch_dist_argument_for_shells into master fix(pytorch):add ckpt-format argument to scripts Created-by: z__y Commit-by: z__y Merged-by: ascend-robot Description: ## What this PR does / why we need it? This PR explicitly adds ckpt-format torch to all repository scripts to support the asynchronous checkpoint saving feature. ## Does this PR introduce any user-facing change? No. This change only adjusts internal script parameters to maintain existing behavior. There are no user-facing API or usage changes. ## How was this patch tested? Tests confirm that asynchronous checkpoint saving works correctly and that the original torch format checkpoint behavior is preserved. See merge request: Ascend/MindSpeed-LLM!4371	1 个月前

Mindspore后端提供GLM4.5系列模型支持

模型	下载链接	序列	实现	集群	是否支持
GLM4.5	106B	4K	Mcore	8x16	✅

MindSpore后端跑通GLM4.5模型教程

环境配置

MindSpeed-LLM MindSpore后端的安装步骤参考MindSpeed LLM安装指导。

训练

预训练

预训练使用方法如下：

cd MindSpeed-LLM
bash examples/mindspore/glm45-moe/pretrain_glm45_moe_106b_4k_A3_ms.sh

用户需要根据实际情况修改脚本中的以下变量：

变量名	含义
MASTER_ADDR	多机情况下主节点IP
NODE_RANK	多机下，各机对应节点序号
CKPT_SAVE_DIR	训练中权重保存路径
DATA_PATH	数据预处理后的数据路径
TOKENIZER_PATH	GLM4.5 tokenizer目录
CKPT_LOAD_DIR	权重转换保存的权重路径，用于初始权重加载，如无初始权重则随机初始化