memfabric-hybrid:基于昇腾生态的内存池化软件项目

内存池化基础软件, 基于超节点总线、服务器网络实现DRAM与显存混合池化,提供极简的内存访问接口和高性能的内存直接访问能力,支撑多种场景下的数据共享与传输

分支5Tags10
文件最后提交记录最后更新时间
Add Pull Request Template Co-authored-by: y30060514<yangjianhong2@h-partners.com> # message auto-generated for no-merge-commit merge: !96 merge AddPRTemplate into develop Add Pull Request Template Created-by: CYangJH Commit-by: y30060514 Merged-by: chenyz6 Description: Add Pull Request Template See merge request: Ascend/memfabric_hybrid!965 个月前
[zbal] 修复AI检视意见 Co-authored-by: Victor<wangsheng325@huawei.com> # message auto-generated for no-merge-commit merge: !783 merge br_fix_ai_review into develop [zbal] 修复AI检视意见 Created-by: victor7wang Commit-by: wangsheng;Victor Merged-by: minibao Description: ## Description 删除因解决冲突引入的重复代码 ## Related Issues [#190](https://gitcode.com/Ascend/memfabric_hybrid/issues/190) ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [ ] Other (please describe):_______________ ## Testing ![image.png](https://raw.gitcode.com/user-images/assets/7672916/3fddec22-6733-4eab-be78-9a32bad388da/image.png 'image.png') ## Checklist - [ ] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!7831 天前
[core] fix: return error when address not found but srcRank==destRank in LocateAddrAndRank Co-authored-by: liuao<royliu.chengdu@gmail.com> # message auto-generated for no-merge-commit merge: !785 merge develop2 into develop [core] fix: return error when address not found but srcRank==destRank in LocateAddrAndRank Created-by: gcw_M5Xrk6cX Commit-by: royliu;liuao Merged-by: liu1103xwxw Description: fix: return error when address not found but srcRank==destRank in LocateAddrAndRank fix: only return error in non-trans scene LocateAddrAndRank; update UT expectations See merge request: Ascend/memfabric_hybrid!7852 天前
move transfer performance examples to benchmark Co-authored-by: wonder1121<wangdan@huawei.com> # message auto-generated for no-merge-commit merge: !564 merge develop_transfer into develop move transfer performance examples to benchmark Created-by: wonder1121 Commit-by: wonder1121 Merged-by: yrewzjsx Description: ==================================================Trans Test Start================================================== Test completed: latency 67.45us, block size 32KB, total threads=2, per-thread times=100, aggregated throughput 6.81 GB/s Test completed: latency 65.35us, block size 64KB, total threads=2, per-thread times=100, aggregated throughput 14.60 GB/s Test completed: latency 65.62us, block size 128KB, total threads=2, per-thread times=100, aggregated throughput 23.32 GB/s Test completed: latency 75.19us, block size 256KB, total threads=2, per-thread times=100, aggregated throughput 24.00 GB/s Test completed: latency 66.34us, block size 512KB, total threads=2, per-thread times=100, aggregated throughput 24.32 GB/s Test completed: latency 84.17us, block size 1024KB, total threads=2, per-thread times=100, aggregated throughput 24.42 GB/s Test completed: latency 95.28us, block size 2048KB, total threads=2, per-thread times=100, aggregated throughput 24.47 GB/s Test completed: latency 175.10us, block size 4096KB, total threads=2, per-thread times=100, aggregated throughput 24.49 GB/s Test completed: latency 346.41us, block size 8192KB, total threads=2, per-thread times=100, aggregated throughput 24.50 GB/s Test completed: latency 691.24us, block size 16384KB, total threads=2, per-thread times=100, aggregated throughput 24.50 GB/s ==================================================Test End================================================== # Pull Request Template ## Description <!-- Provide a clear summary of the change, the problem it solves, and the technical approach. --> ## Related Issues <!-- Replace with actual issue numbers. Example: Closes #123, Related to #456 --> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [ ] Other (please describe):_______________ ## Testing <!-- Describe how the changes were tested. Include: - Unit/integration test coverage - Manual validation steps - Links to CI reports (e.g., CIDA) if available - Note any pending tests --> ## Checklist - [ ] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!56429 天前
[core] fix: return error when address not found but srcRank==destRank in LocateAddrAndRank Co-authored-by: liuao<royliu.chengdu@gmail.com> # message auto-generated for no-merge-commit merge: !785 merge develop2 into develop [core] fix: return error when address not found but srcRank==destRank in LocateAddrAndRank Created-by: gcw_M5Xrk6cX Commit-by: royliu;liuao Merged-by: liu1103xwxw Description: fix: return error when address not found but srcRank==destRank in LocateAddrAndRank fix: only return error in non-trans scene LocateAddrAndRank; update UT expectations See merge request: Ascend/memfabric_hybrid!7852 天前
检视 1 天前
[core][feature] 支持外部传入stream流 Co-authored-by: mrh1024<marunhua1@h-partners.com> # message auto-generated for no-merge-commit merge: !729 merge develop_fix into develop [core][feature] 支持外部传入stream流 Created-by: mrh1024 Commit-by: mrh1024 Merged-by: yrewzjsx Description: **变更说明** 在memfabric smem bm对外拷贝接口,暴露stream入参,用于使用外部传入stream做拷贝任务 **主要变更** 1.API文档变更 2.mooncake patch变更 3.DataCopy,DataCopyBatch接口新增stream入参,新增SMEM_BM_FLAG_USE_EXTERNAL_STREAM flag,配合外部传入stream使用 4.新增06_single_card_external_stream.py最小运行demo See merge request: Ascend/memfabric_hybrid!7299 天前
[feature] 修复mooncake开启etcd模式,内存数据数据不正确的问题 Co-authored-by: mrh1024<marunhua1@h-partners.com> # message auto-generated for no-merge-commit merge: !740 merge develop_fix_mooncake into develop [feature] 修复mooncake开启etcd模式,内存数据数据不正确的问题 Created-by: mrh1024 Commit-by: mrh1024 Merged-by: chenyz6 Description: **变更说明** MasterMetricManager是单例模式,MasterService随etcd连接一同创建和销毁,销毁时没有对MasterMetricManager中的信息进行清理,MasterService再次创建是会重复注册client信息 在MasterService销毁时一同清理MasterMetricManager中的client信息 See merge request: Ascend/memfabric_hybrid!7409 天前
ci: add pre-commit script Co-authored-by: shilinlee<836160610@qq.com> # message auto-generated for no-merge-commit merge: !472 merge mf_pre_commit into develop ci: add pre-commit script Created-by: shilinlee_com Commit-by: shilinlee Merged-by: yrewzjsx Description: # Pull Request Template ## Description chore: add pre-commit script and format files ### Tips 本地提交代码前自动格式化: ``` pip install pre-commit pre-commit install ``` **手动运行脚本,执行增量检查。** 设置对比基线的分支环境变量,**增量**检查你MR检查: ```bash export TARGET_BRANCH=develop # 默认develop bash script/ci-pre-commit-pr.sh ``` ## Related Issues <!-- Replace with actual issue numbers. Example: Closes #123, Related to #456 --> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [ ] Other (please describe):_______________ ## Testing <!-- Describe how the changes were tested. Include: - Unit/integration test coverage - Manual validation steps - Links to CI reports (e.g., CIDA) if available - Note any pending tests --> ## Checklist - [x] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!4721 个月前
[docs] 文档一致性审查修复: 修复26个文档88项问题 Co-authored-by: j00808874<jiangchanghong3@huawei.com> # message auto-generated for no-merge-commit merge: !733 merge mrdoc into develop [docs] 文档一致性审查修复: 修复26个文档88项问题 Created-by: j00808874 Commit-by: j00808874 Merged-by: yrewzjsx Description: ![image.png](https://raw.gitcode.com/user-images/assets/7672916/991c4e3b-d6d3-477d-b6b9-48cfd3c07169/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/7672916/50117056-d687-43e1-a5dc-527c8dc710d5/image.png 'image.png') See merge request: Ascend/memfabric_hybrid!73312 天前
解决流水线问题回退:return error when address not found but srcRank==destRank in LocateAddrAndRank Co-authored-by: p3rry<penghaiqing1@huawei.com> # message auto-generated for no-merge-commit merge: !801 merge develop_0603_revert into develop 解决流水线问题回退:return error when address not found but srcRank==destRank in LocateAddrAndRank Created-by: p3rry Commit-by: p3rry Merged-by: chenyz6 Description: # Pull Request Template ## Description <!-- Provide a clear summary of the change, the problem it solves, and the technical approach. --> ## Related Issues <!-- Replace with actual issue numbers. Example: Closes #123, Related to #456 --> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [ ] Other (please describe):_______________ ## Testing <!-- Describe how the changes were tested. Include: - Unit/integration test coverage - Manual validation steps - Links to CI reports (e.g., CIDA) if available - Note any pending tests --> ## Checklist - [ ] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!8011 天前
解决流水线问题回退:return error when address not found but srcRank==destRank in LocateAddrAndRank Co-authored-by: p3rry<penghaiqing1@huawei.com> # message auto-generated for no-merge-commit merge: !801 merge develop_0603_revert into develop 解决流水线问题回退:return error when address not found but srcRank==destRank in LocateAddrAndRank Created-by: p3rry Commit-by: p3rry Merged-by: chenyz6 Description: # Pull Request Template ## Description <!-- Provide a clear summary of the change, the problem it solves, and the technical approach. --> ## Related Issues <!-- Replace with actual issue numbers. Example: Closes #123, Related to #456 --> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [ ] Other (please describe):_______________ ## Testing <!-- Describe how the changes were tested. Include: - Unit/integration test coverage - Manual validation steps - Links to CI reports (e.g., CIDA) if available - Note any pending tests --> ## Checklist - [ ] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!8011 天前
[core] fix: use <etcd_ip> placeholder instead of 0.0.0.0 for security compliance;remove huawei url Co-authored-by: liuao<royliu.chengdu@gmail.com> # message auto-generated for no-merge-commit merge: !771 merge security into develop [core] fix: use <etcd_ip> placeholder instead of 0.0.0.0 for security compliance;remove huawei url Created-by: gcw_M5Xrk6cX Commit-by: liuao Merged-by: liu1103xwxw Description: # Pull Request Template ## Description <!-- Provide a clear summary of the change, the problem it solves, and the technical approach. --> ## Related Issues <!-- Replace with actual issue numbers. Example: Closes #123, Related to #456 --> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [x] Other (please describe):_______________ ## Testing <!-- Describe how the changes were tested. Include: - Unit/integration test coverage - Manual validation steps - Links to CI reports (e.g., CIDA) if available - Note any pending tests --> ## Checklist - [ ] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!7715 天前
!824 【A2】GVM 支持 Device RDMA 8 个月前
Add support for HOST_SHM data operation type,同节点rank通过共享内存直接memcopy,不再初始化传输引擎 Co-authored-by: j00808874<jiangchanghong3@huawei.com> # message auto-generated for no-merge-commit merge: !505 merge mr4 into develop Add support for HOST_SHM data operation type,同节点rank通过共享内存直接memcopy,不再初始化传输引擎 Created-by: j00808874 Commit-by: j00808874 Merged-by: chenyz6 Description: # Pull Request Template ## Description <!-- Provide a clear summary of the change, the problem it solves, and the technical approach. --> ## Related Issues <!-- Replace with actual issue numbers. Example: Closes #123, Related to #456 --> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [ ] Other (please describe):_______________ ## Testing <!-- Describe how the changes were tested. Include: - Unit/integration test coverage - Manual validation steps - Links to CI reports (e.g., CIDA) if available - Note any pending tests --> ## Checklist - [ ] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!5051 个月前
support to specify mf 3rd-deps install path by CMAKE_INSTALL_PREFIX Co-authored-by: royliu<royliu.chengdu@gmail.com> # message auto-generated for no-merge-commit merge: !220 merge develop into develop support to specify mf 3rd-deps install path by CMAKE_INSTALL_PREFIX Created-by: gcw_M5Xrk6cX Commit-by: royliu Merged-by: liu1103xwxw Description: # Pull Request Template ## Description <!-- Provide a clear summary of the change, the problem it solves, and the technical approach. --> ## Related Issues <!-- Replace with actual issue numbers. Example: Closes #123, Related to #456 --> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [x] Other (please describe):_______________ ## Testing <!-- Describe how the changes were tested. Include: - Unit/integration test coverage - Manual validation steps - Links to CI reports (e.g., CIDA) if available - Note any pending tests --> ## Checklist - [x] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!2204 个月前
ci: add pre-commit script Co-authored-by: shilinlee<836160610@qq.com> # message auto-generated for no-merge-commit merge: !472 merge mf_pre_commit into develop ci: add pre-commit script Created-by: shilinlee_com Commit-by: shilinlee Merged-by: yrewzjsx Description: # Pull Request Template ## Description chore: add pre-commit script and format files ### Tips 本地提交代码前自动格式化: ``` pip install pre-commit pre-commit install ``` **手动运行脚本,执行增量检查。** 设置对比基线的分支环境变量,**增量**检查你MR检查: ```bash export TARGET_BRANCH=develop # 默认develop bash script/ci-pre-commit-pr.sh ``` ## Related Issues <!-- Replace with actual issue numbers. Example: Closes #123, Related to #456 --> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [ ] Other (please describe):_______________ ## Testing <!-- Describe how the changes were tested. Include: - Unit/integration test coverage - Manual validation steps - Links to CI reports (e.g., CIDA) if available - Note any pending tests --> ## Checklist - [x] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!4721 个月前
[core] fix: use <etcd_ip> placeholder instead of 0.0.0.0 for security compliance;remove huawei url Co-authored-by: liuao<royliu.chengdu@gmail.com> # message auto-generated for no-merge-commit merge: !771 merge security into develop [core] fix: use <etcd_ip> placeholder instead of 0.0.0.0 for security compliance;remove huawei url Created-by: gcw_M5Xrk6cX Commit-by: liuao Merged-by: liu1103xwxw Description: # Pull Request Template ## Description <!-- Provide a clear summary of the change, the problem it solves, and the technical approach. --> ## Related Issues <!-- Replace with actual issue numbers. Example: Closes #123, Related to #456 --> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [x] Other (please describe):_______________ ## Testing <!-- Describe how the changes were tested. Include: - Unit/integration test coverage - Manual validation steps - Links to CI reports (e.g., CIDA) if available - Note any pending tests --> ## Checklist - [ ] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!7715 天前
move transfer performance examples to benchmark Co-authored-by: wonder1121<wangdan@huawei.com> # message auto-generated for no-merge-commit merge: !564 merge develop_transfer into develop move transfer performance examples to benchmark Created-by: wonder1121 Commit-by: wonder1121 Merged-by: yrewzjsx Description: ==================================================Trans Test Start================================================== Test completed: latency 67.45us, block size 32KB, total threads=2, per-thread times=100, aggregated throughput 6.81 GB/s Test completed: latency 65.35us, block size 64KB, total threads=2, per-thread times=100, aggregated throughput 14.60 GB/s Test completed: latency 65.62us, block size 128KB, total threads=2, per-thread times=100, aggregated throughput 23.32 GB/s Test completed: latency 75.19us, block size 256KB, total threads=2, per-thread times=100, aggregated throughput 24.00 GB/s Test completed: latency 66.34us, block size 512KB, total threads=2, per-thread times=100, aggregated throughput 24.32 GB/s Test completed: latency 84.17us, block size 1024KB, total threads=2, per-thread times=100, aggregated throughput 24.42 GB/s Test completed: latency 95.28us, block size 2048KB, total threads=2, per-thread times=100, aggregated throughput 24.47 GB/s Test completed: latency 175.10us, block size 4096KB, total threads=2, per-thread times=100, aggregated throughput 24.49 GB/s Test completed: latency 346.41us, block size 8192KB, total threads=2, per-thread times=100, aggregated throughput 24.50 GB/s Test completed: latency 691.24us, block size 16384KB, total threads=2, per-thread times=100, aggregated throughput 24.50 GB/s ==================================================Test End================================================== # Pull Request Template ## Description <!-- Provide a clear summary of the change, the problem it solves, and the technical approach. --> ## Related Issues <!-- Replace with actual issue numbers. Example: Closes #123, Related to #456 --> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [ ] Other (please describe):_______________ ## Testing <!-- Describe how the changes were tested. Include: - Unit/integration test coverage - Manual validation steps - Links to CI reports (e.g., CIDA) if available - Note any pending tests --> ## Checklist - [ ] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!56429 天前
!1114 更新readme,构建,license 6 个月前
[docs] 文档一致性审查修复: 修复26个文档88项问题 Co-authored-by: j00808874<jiangchanghong3@huawei.com> # message auto-generated for no-merge-commit merge: !733 merge mrdoc into develop [docs] 文档一致性审查修复: 修复26个文档88项问题 Created-by: j00808874 Commit-by: j00808874 Merged-by: yrewzjsx Description: ![image.png](https://raw.gitcode.com/user-images/assets/7672916/991c4e3b-d6d3-477d-b6b9-48cfd3c07169/image.png 'image.png') ![image.png](https://raw.gitcode.com/user-images/assets/7672916/50117056-d687-43e1-a5dc-527c8dc710d5/image.png 'image.png') See merge request: Ascend/memfabric_hybrid!73312 天前
更新版本号 1.2.0 Co-authored-by: p3rry<penghaiqing1@huawei.com> # message auto-generated for no-merge-commit merge: !694 merge develop_0516_version into develop 更新版本号 1.2.0 Created-by: p3rry Commit-by: p3rry Merged-by: yrewzjsx Description: # Pull Request Template ## Description <!-- Provide a clear summary of the change, the problem it solves, and the technical approach. --> ## Related Issues <!-- Replace with actual issue numbers. Example: Closes #123, Related to #456 --> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [ ] Other (please describe):_______________ ## Testing <!-- Describe how the changes were tested. Include: - Unit/integration test coverage - Manual validation steps - Links to CI reports (e.g., CIDA) if available - Note any pending tests --> ## Checklist - [ ] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!69419 天前
[core] fix: use <etcd_ip> placeholder instead of 0.0.0.0 for security compliance;remove huawei url Co-authored-by: liuao<royliu.chengdu@gmail.com> # message auto-generated for no-merge-commit merge: !771 merge security into develop [core] fix: use <etcd_ip> placeholder instead of 0.0.0.0 for security compliance;remove huawei url Created-by: gcw_M5Xrk6cX Commit-by: liuao Merged-by: liu1103xwxw Description: # Pull Request Template ## Description <!-- Provide a clear summary of the change, the problem it solves, and the technical approach. --> ## Related Issues <!-- Replace with actual issue numbers. Example: Closes #123, Related to #456 --> ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Performance optimization - [ ] Documentation update - [x] Other (please describe):_______________ ## Testing <!-- Describe how the changes were tested. Include: - Unit/integration test coverage - Manual validation steps - Links to CI reports (e.g., CIDA) if available - Note any pending tests --> ## Checklist - [ ] I have performed a self-review of my own code. - [ ] I have updated the documentation. - [ ] I have added tests to prove my changes are effective. See merge request: Ascend/memfabric_hybrid!7715 天前

DRAM&HBM hybrid pooling, memory semantic interface, high-performance cross-machine memory direct access

Docs PyPI PyPI - Python Version PyPI - Downloads GitCode commit activity license


🔄Latest News

  • [2026/01] DRAM池化相关配套已发布支持,详见软件硬件配套

  • [2025/12] MemFabric + MemCache已作为vllm-ascend backend使能大模型推理加速,详情查看vllm-ascend开源社区,使用示例

  • [2025/11] MemFabric项目于2025年11月开源,在昇腾上提供高效的多链路的D2RH,RH2D,RH2H,D2D,D2H,H2D等内存直接访问能力。

🔜 Roadmap

MemFabric roadmap和版本分支策略详见: Roadmap

🎉概述

MemFabric是一款开源内存池化软件,面向昇腾超节点和服务器等,其设计目的与核心思想是:

  • 异构设备的统一池化: 将多节点的异构设备内存(DRAM|HBM等)池化, 提供高性能的全局内存"直接访问"的能力
  • 简单的北向接口: 提供内存语义访问接口, 即xcopy with global virtual address, 向传统的memcpy概念靠近, 支持D2RH\RH2D\RH2H\D2D等
  • 南向高可扩展: 通过插件的方式支持多种DMA引擎和LD/ST及多种网络/灵衢互联(Device UB、Device RoCE、Host UB、Host RoCE等)

architecture

如上图所示, MemFabric主要分为四大模块: Global Memory Management、Data Operation、Transport Management、API

  • Global Memory Management: 实现全局统一内存地址(Global Virtual Address, GVA)的统一编排、页表映射策略制定及通过驱动将映射策略注入页表
  • Data Operation: xcopy的实现,驱动xDMA、LD/ST实现全局内存直接读写
  • Transport Management: 链接管理; xcopy驱动Host RDMA、Device RDMA、UDMA时,需要建立QP、Jetty链接,xcopy使用SDMA、MTE、LD/ST时不需要Transport Management
  • API: 统一且简单的API及其实现, 包括BM API、SHM API、Trans API,三种API适用于不同的场景

其中, Global Memory Management、Data Operation、Transport Management都实现了逻辑的抽象, 可以轻松扩展实现不同硬件的对接。当前已支持的南向包括:

  • 昇腾A3超节点: DRAM+HBM pooling over Device UB 1.0, DRAM pooling over Host RoCE
  • 昇腾A2服务器: DRAM+HBM pooling over Device RoCE, DRAM pooling over Host RoCE
  • 鲲鹏服务器: DRAM pooling over Host RoCE
  • 鲲鹏超节点: DRAM pooling over Host UB

MemFabric以动态库的形式支持应用快速,简便的集成,支撑大模型KV缓存、生成式推荐缓存、强化训练参数Reshard、模型参数缓存、PD传输等多种业务场景。

🧩核心特性

  • 池化与全局统一编址

MemFabric通过构建逻辑上的全局内存语义统一编址,对分布在不同层级、不同节点的内存单元进行统一管理与使用,使系统能够像管理单一物理资源一样,对跨CPU、NPU的内存资源进行统一寻址和透明访问,核心目的是实现内存资源的整合与统一调度,最大程度的释放硬件性能。 GVA的特点:

  • 它是一个简单的uint64
  • 所有进程的GVA的起始地址一致
  • 所有进程的GVA按线性排布且一致

unified_global_address

  • 跨机跨介质直接访问

    基于MemFabric内存语义统一编址,数据可以在跨节点的多级存储间实现透明、直接访问。

    典型跨节点跨介质的访问路径有:

    • D2RH:本机HBM到跨机DRAM
    • RH2D:跨机DRAM到本机HBM
    • RH2H:跨机DRAM到本机DRAM

    Note: D为Device, RH为Remote Host

MemFabric跨机访问数据流和控制流如下图所示(昇腾A3超节点):

one_copy

当前MemFabric池化的硬件支持情况如下:

  • 昇腾A3超节点:Device UB 1.0,Host rdma
  • 昇腾A2服务器:Device rdma,Host rdma
  • 鲲鹏服务器: Host rdma
池化类型 访问方向 host rdma device rdma Device UB 1.0
DRAM POOL LD2GH
DRAM POOL GH2LD
DRAM POOL LH2GH
DRAM POOL GH2LH
HBM POOL GD2LH
HBM POOL LH2GD
HBM POOL GD2LD
HBM POOL LD2GD
HBM + DRAM POOL GH2GD
HBM + DRAM POOL GD2GH
HBM + DRAM POOL GH2GH
HBM + DRAM POOL GD2GD

Note:

L为Local,D为Device,G为Global,H为Host

  • GH :代表一块DRAM内存,其属于DRAM内存池空间,可能在本地,也可能在远端其他节点
  • GD :代表一块HBM显存,其属于HBM内存池空间,可能在本地,可能在远端其他节点
  • LH :代表一块DRAM内存,其不属于任何内存池空间,其位置在当前进程
  • LD :代表一块HBM显存,其不属于任何内存池空间,其位置在当前进程

🔥性能表现

时延测试

  • 使用2个昇腾A3节点组成双机内存池,将MemFabric对接到MoonCake TE(MoonCake是业界开源的一款的分布式缓存软件, memfabric对接mooncake代码)进行读写时延测试,模拟构造DeepSeek-R1模型KV大小的block size,即:61x128K + 61x16K = 8784KB ≈ 8.57MB,共122个离散地址,性能表现如下:

a3-Latency-performance

带宽测试(单DIE+单CPU)

  • 在昇腾A3超节点跨机数据访问性能(DRAM and HBM pooling over UB 1.0)如下:
数据传输方向 单次数据大小(GB) 带宽(GB/s)
RH2D 1 110.23
RH2D 2 110.19
D2RH 1 74.54
D2RH 2 74.54
RD2D 1 166.47
RD2D 2 166.47
D2RD 1 138.01
D2RD 2 138.01
:和昇腾官方带宽测试工具ascend-dmi一样,A3超节点测试采用的通信带宽的统计方式,
1 GB/s = 1000 * 1000 * 1000 B/s
  • 在昇腾A2服务器跨机数据访问性能(DRAM and HBM pooling over Device RoCE)如下:

A2-Bandwidth-performance

👆 性能测试参考 benchmark

🔍目录结构

├── LICENSE                    # LICENSE
├── .clang-format              # 格式化配置
├── .gitmodules                # 三方库git配置
├── .gitignore                 # git忽视文件
├── CMakeLists.txt             # 项目的CMakeList
├── doc                        # 文档目录
├── examples                   # 样例
│  ├── memory_pool             # 内存池化样例(含基础/扩展/优化/特性/可观测性)
│  ├── transfer                # 传输样例(含batch data write/read)
│  └── hbm_share_memory        # HBM共享内存样例(AllReduce/ShiftPutGet/RDMADemo)
├── script                     # 构建脚本
│  ├── build_and_pack_run.sh   # 编译+打包脚本
│  ├── build.sh                # 编译脚本
│  ├── run_ut.sh               # 编译+运行ut脚本
├── test                       # test目录
│  ├── 3rdparty                # 三方库
│  ├── certs                   # 证书生成脚本
│  ├── python                  # python测试用例
│  └── ut                      # 单元测试用例
├── src                        # 源码
│  ├── acc_links               # 内部通信层 (用于进程间控制命令的通信, 基于Host TCP实现)
│  └── hybm                    # 内存管理与内存访问层 (Global Memory Management、Data Operation、Transport Management)
│  └── smem                    # 语义与接口层 (big memory + transfer + share memory等语义与接口实现)
│  └── util                    # 公共函数
├── README.md

🚀快速入门

请访问以下文档获取简易教程。

  • 编译安装:介绍组件编译和安装教程。

  • 样例执行:介绍如何端到端执行样例代码,包括C++和Python样例。

📑学习教程

  • API介绍:MemFabric提供的多种API的简介

  • C接口:C接口介绍以及C接口对应的API列表

  • python接口:python接口介绍以及python接口对应的API列表

  • ptracer:MemFabric内置性能打点工具简介

📦软件硬件配套说明

  • 硬件型号支持

    • Atlas 200T A2 Box16
    • Atlas 800I A2/A3 系列产品
    • Atlas 800T A2/A3 系列产品
    • Atlas 900 A3 SuperPoD
  • 平台:aarch64/x86

  • 配套软件:CANN 8.1.RC1及之后版本

  • cmake >= 3.19

  • GLIBC >= 2.28

  • Ascend HDK配套驱动(npu-driver)、固件(npu-firmware)依赖(使用不同介质所需最低版本不同):

    特性 HDK最低版本需求 HDK推荐版本
    HBM池化 24.1.RC2 24.1.RC2
    DRAM池化 25.5.0 25.5.1
  • LingQu Computing Network: 1.5.0版本,A3 DRAM池化需要配套升级1520 L1,升级指导书如下:

📌FAQ

常见问题请参考:FAQ

📝相关信息

项目介绍

内存池化基础软件, 基于超节点总线、服务器网络实现DRAM与显存混合池化,提供极简的内存访问接口和高性能的内存直接访问能力,支撑多种场景下的数据共享与传输

https://gitcode.com/Ascend/memfabric_hybrid定制我的领域

下载使用量

0

项目总下载次数(含Clone、Pull、 zip 包及 release 下载),每日凌晨更新

语言类型

C++81.15%
Python11.47%
C3.12%
Shell2.02%
CMake1.42%