文件最后提交记录最后更新时间
feat(issue-280): [Task|任务]: namespace 整改 Co-authored-by: nino888<yinqiran1@huawei.com> # message auto-generated for no-merge-commit merge: !387 merge autodev/issue-280 into master feat(issue-280): [Task|任务]: namespace 整改 Created-by: nino888 Commit-by: nino888 Merged-by: cann-robot Description: ## Summary - Implement issue #280: [Task|任务]: namespace 整改 - Source issue: https://gitcode.com/cann/shmem/issues/280 - Branch: autodev/issue-280 (nino888/shmem -> cann/shmem) ## Changes - examples/dispatch_gmm_combine/include/dispatch_gmm_combine.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_init_routing_quant_v2.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_common.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_expert_token_out.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_fullload_dynamic_quant.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_fullload_quant.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_fullload_quant_base.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_gather_dynamic_quant.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_gather_quant.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_mrgsort.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_mrgsort_out.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_sort_base.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_sort_multi_core.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_sort_one_core.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_src_to_dst_and_gather.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_src_to_dst_op.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_src_to_dst_with_capacity.h - examples/dispatch_gmm_combine/include/moe_token_unpermute.h - examples/dispatch_gmm_combine/include/select_helper.h - examples/dispatch_gmm_combine/include/sync_util.h - examples/dynamic_tiling/impl/kernel/allgather_matmul.h - examples/dynamic_tiling/impl/kernel/allgather_matmul_padding.h - examples/dynamic_tiling/impl/kernel/allgather_matmul_with_gather_result.h - examples/dynamic_tiling/impl/kernel/matmul_allreduce.h - examples/dynamic_tiling/impl/kernel/matmul_reduce_scatter.h - examples/dynamic_tiling/impl/kernel/matmul_reduce_scatter_padding_a.h - examples/dynamic_tiling/impl/kernel/matmul_reduce_scatter_padding_ab.h - examples/dynamic_tiling/impl/kernel/matmul_reduce_scatter_padding_b.h - examples/matmul_allreduce/epilogue/block/epilogue_allreduce.hpp - src/device/gm2gm/shmemi_device_rma.cpp - src/host/bootstrap/shmemi_bootstrap_config_store.cpp - src/host/data_plane/shmem_host_rma.cpp - src/host/entity/mem_entity_default.cpp - src/host/entity/mem_entity_entry.cpp - src/host/init/shmem_init.cpp - src/host/mem/heap/hybm_vmm_based_segment.cpp - src/host/mem/shmem_rma.cpp - src/host/team/shmem_team.cpp - src/host/transport/transport_manager.cpp ## Local Validation - echo 'TODO: replace with real tests, e.g. pytest -q': passed See merge request: cann/shmem!38711 小时前
fix dispatch_gmm_combine verify Co-authored-by: Super User<root@localhost.localdomain> # message auto-generated for no-merge-commit merge: !218 merge fix_dispatch_gmm_combine_verify into master fix dispatch_gmm_combine verify Created-by: lishaoxun Commit-by: Super User Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> 新增dispatch_gmm_combine算子torch实现,用于精度校验 ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #123--> <!-- 如果这个PR是为了解决特定的问题单,请在这里描述问题单单号。--> [#171](https://gitcode.com/cann/shmem/issues/171) ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ``` data index: 52, expected: 3.339843750, actual: 3.371093750, rdiff: 0.009357 data index: 125, expected: -1.260742188, actual: -1.249023438, rdiff: -0.009295 data index: 370, expected: 0.528808594, actual: 0.500000000, rdiff: 0.054478 data index: 544, expected: 7.320312500, actual: 7.253906250, rdiff: 0.009072 data index: 570, expected: -1.622070312, actual: -1.593750000, rdiff: -0.017459 data index: 703, expected: -1.177734375, actual: -1.207031250, rdiff: -0.024876 data index: 739, expected: 12.265625000, actual: 12.132812500, rdiff: 0.010828 data index: 846, expected: -6.179687500, actual: -6.234375000, rdiff: -0.008850 data index: 901, expected: -1.089843750, actual: -1.068359375, rdiff: -0.019713 data index: 1048, expected: 1.681640625, actual: 1.709960938, rdiff: 0.016841 data index: 1088, expected: -4.527343750, actual: -4.582031250, rdiff: -0.012079 Differential num: 4469 Old precision - precision: 99.02583530970982, eb: 0.07541999802924693 PRECISION PASS output.shape=torch.Size([458752]) golden.shape=torch.Size([458752]) Running old precision check... actual_output shape=torch.Size([458752]) golden_output shape=torch.Size([458752]) actual_output shape=torch.Size([458752]) golden_output shape=torch.Size([458752]) data index: 94, expected: -1.141601562, actual: -0.978515625, rdiff: -0.142857 data index: 242, expected: 1.084960938, actual: 1.129882812, rdiff: 0.041404 data index: 291, expected: -2.644531250, actual: -2.605468750, rdiff: -0.014771 data index: 349, expected: -5.554687500, actual: -5.644531250, rdiff: -0.016174 data index: 404, expected: 22.781250000, actual: 22.984375000, rdiff: 0.008916 data index: 409, expected: -3.662109375, actual: -3.714843750, rdiff: -0.014400 data index: 478, expected: 0.571289062, actual: 0.562500000, rdiff: 0.015385 data index: 652, expected: 1.872070312, actual: 1.836914062, rdiff: 0.018779 data index: 667, expected: 11.968750000, actual: 11.875000000, rdiff: 0.007833 data index: 695, expected: 4.050781250, actual: 4.011718750, rdiff: 0.009643 data index: 699, expected: 1.235351562, actual: 1.256835938, rdiff: 0.017391 Differential num: 4752 Old precision - precision: 98.96414620535714, eb: 0.42060473933815956 SHMEM metrics - MARE: 198.758972168, MERE: 0.002330900, RMSE: 0.095949610 Torch_NPU metrics - MARE: 2388.652099609, MERE: 0.034038182, RMSE: 0.822984395 Ratios - MARE: 0.083209678, MERE: 0.068478992, RMSE: 0.116587399 PRECISION PASS ``` ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> 不涉及 ## 类型标签 <!-- [x] 表示选中 --> - [x] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/shmem!2182 个月前
docs(api): update API comments Co-authored-by: JoyceAby<JoyceAby@163.com> # message auto-generated for no-merge-commit merge: !247 merge doc/update-api-comments into master docs(api): update API comments Created-by: Joyce_An Commit-by: JoyceAby Merged-by: cann-robot Description: ## 描述 API注释说明中存在不合适的描述“GPU ## 关联的Issue https://gitcode.com/cann/shmem/issues/197 ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> 更新相关接口说明描述为“NPU” ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [x] 文档更新 - [ ] 其他,请描述: See merge request: cann/shmem!2471 个月前
example提供torch样例 Co-authored-by: zhangyunqi<zhangyunqi5@huawei.com> # message auto-generated for no-merge-commit merge: !110 merge allgathertorch into master example提供torch样例 Created-by: zhangyunqi Commit-by: zhangyunqi Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> example提供torch样例 ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #123--> <!-- 如果这个PR是为了解决特定的问题单,请在这里描述问题单单号。--> ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ![image.png](https://raw.gitcode.com/user-images/assets/8546182/ee606807-5074-41e9-8292-38738c28600b/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8546182/75d4b95b-d340-418b-a982-7ce61bf9ed31/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8546182/4f548f9a-90ff-4f27-9cb1-a183aa6c1f57/image.png 'image.png') ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [x] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/shmem!1102 个月前
fix dispatch_gmm_combine verify Co-authored-by: Super User<root@localhost.localdomain> # message auto-generated for no-merge-commit merge: !218 merge fix_dispatch_gmm_combine_verify into master fix dispatch_gmm_combine verify Created-by: lishaoxun Commit-by: Super User Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> 新增dispatch_gmm_combine算子torch实现,用于精度校验 ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #123--> <!-- 如果这个PR是为了解决特定的问题单,请在这里描述问题单单号。--> [#171](https://gitcode.com/cann/shmem/issues/171) ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ``` data index: 52, expected: 3.339843750, actual: 3.371093750, rdiff: 0.009357 data index: 125, expected: -1.260742188, actual: -1.249023438, rdiff: -0.009295 data index: 370, expected: 0.528808594, actual: 0.500000000, rdiff: 0.054478 data index: 544, expected: 7.320312500, actual: 7.253906250, rdiff: 0.009072 data index: 570, expected: -1.622070312, actual: -1.593750000, rdiff: -0.017459 data index: 703, expected: -1.177734375, actual: -1.207031250, rdiff: -0.024876 data index: 739, expected: 12.265625000, actual: 12.132812500, rdiff: 0.010828 data index: 846, expected: -6.179687500, actual: -6.234375000, rdiff: -0.008850 data index: 901, expected: -1.089843750, actual: -1.068359375, rdiff: -0.019713 data index: 1048, expected: 1.681640625, actual: 1.709960938, rdiff: 0.016841 data index: 1088, expected: -4.527343750, actual: -4.582031250, rdiff: -0.012079 Differential num: 4469 Old precision - precision: 99.02583530970982, eb: 0.07541999802924693 PRECISION PASS output.shape=torch.Size([458752]) golden.shape=torch.Size([458752]) Running old precision check... actual_output shape=torch.Size([458752]) golden_output shape=torch.Size([458752]) actual_output shape=torch.Size([458752]) golden_output shape=torch.Size([458752]) data index: 94, expected: -1.141601562, actual: -0.978515625, rdiff: -0.142857 data index: 242, expected: 1.084960938, actual: 1.129882812, rdiff: 0.041404 data index: 291, expected: -2.644531250, actual: -2.605468750, rdiff: -0.014771 data index: 349, expected: -5.554687500, actual: -5.644531250, rdiff: -0.016174 data index: 404, expected: 22.781250000, actual: 22.984375000, rdiff: 0.008916 data index: 409, expected: -3.662109375, actual: -3.714843750, rdiff: -0.014400 data index: 478, expected: 0.571289062, actual: 0.562500000, rdiff: 0.015385 data index: 652, expected: 1.872070312, actual: 1.836914062, rdiff: 0.018779 data index: 667, expected: 11.968750000, actual: 11.875000000, rdiff: 0.007833 data index: 695, expected: 4.050781250, actual: 4.011718750, rdiff: 0.009643 data index: 699, expected: 1.235351562, actual: 1.256835938, rdiff: 0.017391 Differential num: 4752 Old precision - precision: 98.96414620535714, eb: 0.42060473933815956 SHMEM metrics - MARE: 198.758972168, MERE: 0.002330900, RMSE: 0.095949610 Torch_NPU metrics - MARE: 2388.652099609, MERE: 0.034038182, RMSE: 0.822984395 Ratios - MARE: 0.083209678, MERE: 0.068478992, RMSE: 0.116587399 PRECISION PASS ``` ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> 不涉及 ## 类型标签 <!-- [x] 表示选中 --> - [x] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/shmem!2182 个月前
编译选项整改 xcce2xasc Co-authored-by: zhangyunqi<zhangyunqi5@huawei.com> # message auto-generated for no-merge-commit merge: !128 merge xcce2xasc into master 编译选项整改 xcce2xasc Created-by: zhangyunqi Commit-by: zhangyunqi Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> xcce2xasc ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #123--> <!-- 如果这个PR是为了解决特定的问题单,请在这里描述问题单单号。--> https://gitcode.com/cann/shmem/issues/95 ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ![image.png](https://raw.gitcode.com/user-images/assets/8546182/9e5c647c-f4ef-43e2-90bb-14f900cd48fa/image.png 'image.png') A5 ![image.png](https://raw.gitcode.com/user-images/assets/8546182/86160a4e-89c8-4b48-8fe0-992809bfba91/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8546182/0ca6578d-9de2-4661-9099-641c2223a74b/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8546182/5201bfb2-5503-451e-8028-f4abb1c6d21e/image.png 'image.png') rdma ![image.png](https://raw.gitcode.com/user-images/assets/8546182/5a1313b7-ea0b-45cf-b8a3-bda6c332ad3f/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8546182/90473372-2bb5-421a-bed4-4c168901d845/image.png 'image.png') ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [x] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/shmem!1281 个月前
README.md
  1. 编译项目
    shmem/ 根目录下执行编译脚本:

    bash scripts/build.sh -examples
    
  2. 数据生成
    2.1 - config 文件参数:

    • dataType 数据类型,默认为2(INT8INT8_INT32_FP16),具体可见gen_data.py中的定义。
    • peSize 卡数,一般为2 4 6 8 16 32。
    • m token数。
    • k 激活矩阵的Hidden维。
    • n 权重矩阵的Hidden维。
    • weightNz 权重矩阵是否是Nz数据格式,默认为1。
    • dequantGranularity 反量化的方式,默认为3。
    • local_expert_nums 每张卡上的专家数量。
    • EP 与peSize保持一致。
    • maxOutputSize alltoall后的最大的token数,多余的则截断,默认开两倍的m。
    • topK 每个token复制的份数。
    • transB 权重矩阵是否转置,默认为0。

    2.2 - 执行生成脚本:

    该步骤会在scripts/run.sh中自动执行,无需单独执行。

    cd examples/dispatch_gmm_combine
    # 基于cpu实现
    python3 utils/gen_data.py
    # 基于torch-npu实现(默认用gen_data.py生成的输入)
    python3 utils/gen_data_by_torch_npu.py
    

    注:运行用例需安装torch-npu

  3. 运行Dispatch-Gmm-Combine示例程序 进入示例目录并执行运行脚本,参数同config中的保持一致:

    cd examples/dispatch_gmm_combine
    bash scripts/run.sh -pes {peSize} -M {m} -K {k} -N {n} -expertPerPe {local_expert_nums} -dataType {dataType} -weightNz {weightNz} -transB {transB}
    

    scripts/run.sh会执行算子(输出结果保存在examples/dispatch_gmm_combine/out目录下)并进行结果校验。 也可以单独对结果进行校验:

    cd examples/dispatch_gmm_combine
    python3 utils/check_result.py
    
  4. 运行示例

    # 先将配置写入config.ini
    cd examples/dispatch_gmm_combine
    bash ./scripts/run.sh -pes 2 -M 64 -K 7168 -N 4096 -expertPerPe 2 -dataType 2 -weightNz 1 -transB 0