文件最后提交记录最后更新时间
修复A3上读网卡ip出错的问题 Co-authored-by: vector5<caobingjie@huawei.com> # message auto-generated for no-merge-commit merge: !375 merge fixrdma into master 修复A3上读网卡ip出错的问题 Created-by: vector5 Commit-by: vector5 Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> 修复A3上读网卡ip出错的问题。 hcomm 定义 ( hccp_common.h ): ``` struct IfaddrInfo { union HccpIpAddr ip; // IP 地址 struct in_addr mask; // IPv4 掩码 }; struct InterfaceInfo { int family; int scopeId; struct IfaddrInfo ifaddr; char ifname[MAX_INTERFACE_NAME_LEN]; }; ``` shmem 定义 ( dl_hccp_def.h ): ``` struct HccpIfaddrInfo { HccpIpAddr ip; // IP 地址 struct in_addr mask; // IPv4 掩码 struct in6_addr maskv6; // IPv6 掩码 ← 多了这个字段! }; struct HccpInterfaceInfo { int family; int scopeId; HccpIfaddrInfo ifaddr; char ifname[HCCP_MAX_INTERFACE_NAME_LEN]; }; ``` shmem 的 HccpIfaddrInfo 多了一个 maskv6 字段 (16字节),导致config.isAll = true时无法获取第二个网卡的ip,解析出来是掩码: 1. 结构体大小不匹配 :底层库按 hcomm 结构体填充,shmem 按自己的结构体读取 2. 内存布局错位 :第二个接口条目的数据被错误解析 这就是为什么日志显示: - family=0 (实际是上一个条目的掩码数据被误读) - ip=255.255.0.0 (掩码值被当作 IP 读取) 修改方案:配置参数改为false只获取特定phyid的网卡 ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #123--> https://gitcode.com/cann/shmem/issues/274 ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> A2测试 ![image.png](https://raw.gitcode.com/user-images/assets/8546182/2cc2753b-5277-4173-b10b-8458404cf2eb/image.png 'image.png') A2跨机16卡(cann82T) ![image.png](https://raw.gitcode.com/user-images/assets/8546182/58d384b7-3297-4bc5-8555-40e5d1da969f/image.png 'image.png') A3测试 单机16卡 ![image.png](https://raw.gitcode.com/user-images/assets/8546182/4704ad40-e8dc-48f8-ba4e-ebdb90e3a5a7/image.png 'image.png') 跨机32卡 ![image.png](https://raw.gitcode.com/user-images/assets/8546182/7ecf077d-aa66-4eb7-b391-27957caddca8/image.png 'image.png') ## 类型标签 <!-- [x] 表示选中 --> - [x] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/shmem!3753 天前
fix: treat warnings as errors Co-authored-by: nino888<yinqiran1@huawei.com> # message auto-generated for no-merge-commit merge: !191 merge fix-compile-warnings-master into master fix: treat warnings as errors Created-by: nino888 Commit-by: nino888 Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #123--> <!-- 如果这个PR是为了解决特定的问题单,请在这里描述问题单单号。--> ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [x] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/shmem!1912 个月前
修正target_hint打印字段正确性 Co-authored-by: liragnarosf<lijian120@huawei.com> # message auto-generated for no-merge-commit merge: !388 merge master into master 修正target_hint打印字段正确性 Created-by: liragnarosf Commit-by: liragnarosf Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> target_hint字段为uint8_t类型,打印会出现乱码,需要转义后进行打印 ## 关联的Issue <!-- 如果这个PR是为了解决特定的Issue,请在这里提供Issue链接。例如:关联Issue #123--> ## 测试 <!--描述进行了哪些测试来验证你的改动。包括但不限于二级冒烟、算子泛化等。--> ![image.png](https://raw.gitcode.com/user-images/assets/8546182/f32e57a9-22f5-44e7-91bd-14f00388281f/image.png 'image.png') ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/shmem!3881 天前
fix(rootinfo): 按 mainboard_id 选标卡 topo 文件,修复 350 被识别为 300a Co-authored-by: suqwe<sujianjia@huawei.com> # message auto-generated for no-merge-commit merge: !386 merge fix/rootinfo-topo-file-mainboard-id into master fix(rootinfo): 按 mainboard_id 选标卡 topo 文件,修复 350 被识别为 300a Created-by: suqwe Commit-by: suqwe Merged-by: cann-robot Description: ## 现象 tools/rootinfo/root_info_generate 在 Atlas 350 标卡(NOMESH/2PMESH/4PMESH)上生成 rootinfo 时,topo_file_path 一律是 atlas_300a.json,导致下游把 350 标卡识别成 300a。 ## 根因 src/host/transport/topo/rootinfo/aclshmemi_product_strategy.cpp:70aclshmemi_card_product_t::get_root_info 里写死了 atlas_300a.json,对所有标卡 mainboard_id 都生效。该 topo 文件名不在 hcomm 的映射表里。 ## 修复 按 mainboard_id 选 topo 文件,与 hcomm src/legacy/ascend950/framework/topo/topo_addr_info/src/topo.c 的映射对齐: | mainboard_id | topo 文件 | |---|---| | CARD_NOMESH (0x68) | atlas_350_1.json | | CARD_2PMESH (0x6a) | atlas_350_2.json | | CARD_4PMESH (0x6c) | atlas_350_3.json | 实现采用 [KEEP-NEW-SWITCH-CLEANUP] 模式: - [NEW] 新增 card_topo_filename(mainboard_id) 工具函数 - [SWITCH] aclshmemi_card_product_t::get_root_info 改为 build_topo_file_path(driver_path, card_topo_filename(mainboard_id)) - [CLEANUP] 删除原 "atlas_300a.json" 字面量 ## 验证 shmem_rootinfo target 在 -Werror 下编译通过。 Fix #282 See merge request: cann/shmem!3861 天前
支持AIV直驱UDMA功能,控制面主流程 Co-authored-by: YeZZzzz1<yezhenni1@huawei.com> # message auto-generated for no-merge-commit merge: !194 merge master into master 支持AIV直驱UDMA功能,控制面主流程 Created-by: YeZZzzz1 Commit-by: YeZZzzz1 Merged-by: cann-robot Description: ## 描述 支持AIV直驱UDMA功能,主流程部分修改 ## 关联的Issue https://gitcode.com/cann/shmem/issues/161 ## 测试 功能测试: - examples: ![image.png](https://raw.gitcode.com/user-images/assets/8546182/b945a60d-bda4-4dcd-9647-8ba943772be6/image.png 'image.png') - ut: ![image.png](https://raw.gitcode.com/user-images/assets/8546182/cbbbc7a2-e279-447f-a6c1-9eeb7567d43e/image.png 'image.png') ## 文档更新 <!--如果这个PR包含文档的更新,请在这里指出。例如:更新了README.md文件。--> ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [x] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/shmem!1942 个月前
feat(issue-280): [Task|任务]: namespace 整改 Co-authored-by: nino888<yinqiran1@huawei.com> # message auto-generated for no-merge-commit merge: !387 merge autodev/issue-280 into master feat(issue-280): [Task|任务]: namespace 整改 Created-by: nino888 Commit-by: nino888 Merged-by: cann-robot Description: ## Summary - Implement issue #280: [Task|任务]: namespace 整改 - Source issue: https://gitcode.com/cann/shmem/issues/280 - Branch: autodev/issue-280 (nino888/shmem -> cann/shmem) ## Changes - examples/dispatch_gmm_combine/include/dispatch_gmm_combine.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_init_routing_quant_v2.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_common.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_expert_token_out.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_fullload_dynamic_quant.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_fullload_quant.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_fullload_quant_base.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_gather_dynamic_quant.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_gather_quant.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_mrgsort.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_mrgsort_out.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_sort_base.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_sort_multi_core.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_sort_one_core.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_src_to_dst_and_gather.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_src_to_dst_op.h - examples/dispatch_gmm_combine/include/moe_init_routing_quant_v2/moe_v2_src_to_dst_with_capacity.h - examples/dispatch_gmm_combine/include/moe_token_unpermute.h - examples/dispatch_gmm_combine/include/select_helper.h - examples/dispatch_gmm_combine/include/sync_util.h - examples/dynamic_tiling/impl/kernel/allgather_matmul.h - examples/dynamic_tiling/impl/kernel/allgather_matmul_padding.h - examples/dynamic_tiling/impl/kernel/allgather_matmul_with_gather_result.h - examples/dynamic_tiling/impl/kernel/matmul_allreduce.h - examples/dynamic_tiling/impl/kernel/matmul_reduce_scatter.h - examples/dynamic_tiling/impl/kernel/matmul_reduce_scatter_padding_a.h - examples/dynamic_tiling/impl/kernel/matmul_reduce_scatter_padding_ab.h - examples/dynamic_tiling/impl/kernel/matmul_reduce_scatter_padding_b.h - examples/matmul_allreduce/epilogue/block/epilogue_allreduce.hpp - src/device/gm2gm/shmemi_device_rma.cpp - src/host/bootstrap/shmemi_bootstrap_config_store.cpp - src/host/data_plane/shmem_host_rma.cpp - src/host/entity/mem_entity_default.cpp - src/host/entity/mem_entity_entry.cpp - src/host/init/shmem_init.cpp - src/host/mem/heap/hybm_vmm_based_segment.cpp - src/host/mem/shmem_rma.cpp - src/host/team/shmem_team.cpp - src/host/transport/transport_manager.cpp ## Local Validation - echo 'TODO: replace with real tests, e.g. pytest -q': passed See merge request: cann/shmem!3878 小时前
目录更新 Co-authored-by: james88liu<liujianxing1@huawei.com> # message auto-generated for no-merge-commit merge: !87 merge br_dir_1 into master 目录更新 Created-by: james88liu Commit-by: james88liu Merged-by: cann-robot Description: ## 描述 <!--在这里详细描述你的改动,包括改动的原因和所采取的方法。--> ## 关联的Issue https://gitcode.com/cann/shmem/issues/63 ## 测试 ![image.png](https://raw.gitcode.com/user-images/assets/8546182/c6cf947b-7583-4e9e-aa14-105f025c4314/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/8546182/cf5148a7-5e17-4b26-9321-c16504910460/image.png 'image.png') ## 文档更新 ``` ├── bootstrap │   ├── config_store │   │   ├── CMakeLists.txt │   │   ├── acc_links │   │   │   ├── csrc │   │   │   │   ├── common │   │   │   │   ├── security │   │   │   │   └── under_api │   │   │   │   └── openssl │   │   │   └── include │   ├── shmemi_bootstrap_mpi.cpp │   ├── shmemi_bootstrap_uid.cpp │   └── socket │   ├── uid_socket.cpp │   ├── uid_socket.h │   └── uid_utils.h ├── data_plane │   ├── shmem_host_cc.cpp │   └── shmem_host_rma.cpp ├── entity │   ├── mem_entity_base.h │   ├── mem_entity_def.h ├── init │   ├── backends │   │   ├── shmem_init_backend.cpp │   │   └── shmemi_init_backend.h │   ├── bootstrap │   │   ├── shmemi_bootstrap.cpp │   │   └── shmemi_bootstrap.h │   ├── shmem_init.cpp │   └── shmemi_init.h ├── mem │   ├── heap │   │   ├── driver │   │   │   ├── devmm_cmd.h │   │   │   └── userspace │   │   │   ├── devmm_define.h │   │   ├── hybm_mem_slice.cpp │   │   └── hybm_mem_slice.h │   ├── shmem_mgr.cpp │   ├── shmem_mm.cpp │   ├── shmem_rma.cpp │   ├── shmemi_mgr.h │   └── shmemi_mm.h ├── python_wrapper │   ├── CMakeLists.txt │   └── pyshmem.cpp ├── shmemi_host_common.h ├── shmemi_host_def.h ├── sync │   ├── shmemi_sync.cpp │   └── shmemi_sync.h ├── team │   ├── shmem_team.cpp │   └── shmemi_team.h ├── transport │   ├── device_rdma │   ├── transport_def.h │   ├── transport_manager.cpp │   └── transport_manager.h └── utils ├── log │   ├── shmemi_log_defs.h │   └── shmemi_logger.cpp ├── shmemi_file_util.h ├── under_api │   ├── dl_acl_api.cpp │   ├── dl_acl_api.h └── utils.h ``` ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他,请描述: See merge request: cann/shmem!874 个月前