| [bugfix] Compatibility between lora and meta initialization in fsdp2 backend
Co-authored-by: LKONE<wanglikai4@huawei.com>
# message auto-generated for no-merge-commit merge:
!2499 merge master into master
[bugfix] Compatibility between lora and meta initialization in fsdp2 backend
Created-by: wanglikai1019
Commit-by: LKONE
Merged-by: ascend-robot
Description: ## What this PR does / why we need it?
fsdp2后端lora特性和meta初始化存在兼容问题,具体为以下两个方面:
(1)lora权重初始化早于meta初始化,导致lora权重受empty_like影响,无法正确初始化;调整为在meta初始化之后再调用一次init_model_weights进行随机初始化;
(2)lora权重包裹早于基础dcp权重加载,lora权重包裹后,基础权重的key会增加.base_layer前缀,导致无法与权重中的key进行匹配;调整为新增lora专用的ModelState,其中在权重加载时先删除.base_layer前缀再添加.base_layer前缀。
其余为根据pre-commit codecheck检查结果进行的改动,具体为以下几个方面:
(1)Redefining name 'args' from outer scope:trainer文件中,main函数里面的全局变量args和后续使用的局部变量名称冲突,修改main函数的里面的全局变量名args为arguments,与后续的局部变量做区分;
(2)String statement has no effect (pointless-string-statement):"""Build optimizer for the model."""这行注释在代码中间,需要调整位置;
(3)Attempted relative import beyond top-level package (relative-beyond-top-level):import引用为相对路径,改为绝对路径;
(4)Using open without explicitly specifying an encoding (unspecified-encoding):调用open函数时为传入具体的编码方式,修改为linux通用的utf-8;
(5)Use lazy % formatting in logging functions (logging-fstring-interpolation):logger info中使用了 {} 这种延迟格式化写法,调整为%s。
其余为格式修改
## Does this PR introduce any user-facing change?
无
## How was this patch tested?
开启meta初始化,lora微调能否正确进行,基础部分权重能否正确加载,loss是否正常不为NAN。
See merge request: Ascend/MindSpeed-MM!2499 | 20 天前 |