Star
187
Fork
237
代码
介绍
代码
Issues
79
Pull Requests
65
流水线
Actions
讨论
Wiki
项目成员
54
分析
项目设置
Star
187
Fork
237
2.2.0
MindSpeed-LLM
/
mindspeed_llm
/
tasks
/
models
/
transformer
下载当前目录
ascend-robot
[pytorch][bugfix]fix deepseek3 tnd bug in mbs > 1
031c12d9
创建于
2月27日
历史提交
文件
最后提交记录
最后更新时间
__init__.py
!2030
CodeChcek整改-master Merge pull request
!2030
from shenjiarun/master
1 年前
attention.py
!2089
refactor: move spec related structure into right position Merge pull request
!2089
from RuanZhiXiang/refactor_mla_attention
1 年前
fast_mlp.py
!1958
整改仓库文件结构 Merge pull request
!1958
from DONGHAORAN/master
1 年前
hunyuan_large_attention.py
[pytorch][bugfix]fix about cleancode Co-authored-by: qyzqyz<quyueze@h-partners.com>
8 个月前
hunyuan_rope.py
!2249
【HunyuanLargeMoE】part of model Merge pull request
!2249
from zhoubeirong/0218-model-part1
1 年前
mla_dot_product_attention.py
!3160
[pytorch][bugfix]fix MLA init and args bug Merge pull request
!3160
from HanhuiChen/master
9 个月前
mla_up_proj_overlap_tp_comm.py
!3156
[pytorch][bugfix]Fix MLA up proj overlap and remove deprecated funcs Merge pull request
!3156
from shengjy/master
9 个月前
multi_latent_attention.py
[pytorch][bugfix]fix deepseek3 tnd bug in mbs > 1 Co-authored-by: guozhihua2<guozhihua2@huawei.com> # message auto-generated for no-merge-commit merge:
!4235
merge deepseek3_tnd_mbs_2.2.0 into 2.2.0 [pytorch][bugfix]fix deepseek3 tnd bug in mbs > 1 Created-by: guozhihua2 Commit-by: guozhihua2 Merged-by: ascend-robot Description: 1. 去掉deepseek3中mla部分在tnd下的维度转换,原生是megatron的实现,megatron是全局tnd,llm是只在fa计算过程是tnd See merge request: Ascend/MindSpeed-LLM
!4235
3 个月前
qwen3_next_full_attention.py
!3316
[pytorch][model]add qwen3_next model Merge pull request
!3316
from guozhihua/qwen3_next_master
9 个月前
qwen3_next_gated_deltanet_attention.py
[pytorch][model]change l2norm in qwen3_next for hf Co-authored-by: guozhihua<guozhihua2@huawei.com> # message auto-generated for no-merge-commit merge:
!3443
merge qwen3_next_l2norm_2.2 into 2.2.0 [pytorch][model]change l2norm in qwen3_next for hf Created-by: guozhihua2 Commit-by: guozhihua Merged-by: ascend-robot Description: [pytorch][model]change l2norm in qwen3_next for hf See merge request: Ascend/MindSpeed-LLM
!3443
8 个月前
transformer_layer.py
[pytorch][bugfix] fix the bug with gemma enabling recomputation Co-authored-by: yanzhixiao<yanzhixiao@h-partners.com> # message auto-generated for no-merge-commit merge:
!4196
merge bugfix-gemma-2.2.0 into 2.2.0 [pytorch][bugfix] fix the bug with gemma enabling recomputation Created-by: yanzhixiao23 Commit-by: yanzhixiao Merged-by: ascend-robot Description: [pytorch][bugfix] fix the bug with gemma enabling recomputation. See merge request: Ascend/MindSpeed-LLM
!4196
3 个月前
transformer_layer_hunyuan.py
!3025
[pytorch][bugfix] fix the hunyuan model Merge pull request
!3025
from yanzhixiao/bugfix-hunyuan
11 个月前