Expert Kit is an efficient foundation of Expert Parallelism (EP) MoE model Inference on heterogenous hardware.

文件	最后提交记录	最后更新时间
.vscode	perf: bench ffn using Criterion (#80) * perf: bench ffn using Criterion * chore: clippy & format * better ffn benchmarking * chore: explain why we use cuda to initialize weights	10 个月前
container	Add dockerfile and related docker-compose for quick testing (#23) * feat: add dockerfile for controller and worker * fix: add migration tool diesel feat: add brief docker-compose file for quick deployment * feat[version]: add rust-toolchains.toml for rust version config feat[worker]: worker read ctrl addr from env feat[computation]: change [::1] to 0.0.0.0 for cross network usage feat[docker]: install specific rust-toolchain from rust-toolchain.toml feat[docker]: update torch version to 2.7.0 for RTX50 support fix[torch_test]: fix module path import error * feat: get config from get_config_key method feat: update transformers version for qwen3 support * feat: format dockerfile * feat[dockerfile]: extract common envs to independent stage feat[compose]: update compose file according to dockerfile * add ek-cli into docker and fix weight server proxy issue feat: add ek-cli into docker image feat: change test config according to new config.rs fix: remove proxy settings in ek-runtime environments * fix: add max_conn_size settings for * feat: refactor configuration structure * refactor: migrate config to new format * feat: Add dockerfile for kunpeng arm and x86 * fix: redundant space in dockerfile * feat: add diesel cli tool in runtime refactor: only leave ek-cli as entry of ek * feat: using compose.override to extract user define variables feat: add a simple readme	1 年前
dev	feat: inject detailed activation to clickhouse	1 年前
doc	update: add release tags (#72)	10 个月前
ek-base	feat: RDMA Queue impl (#102) * perf: local shared memory for controller-worker communication * chore: clippy & format * feat: raii for shm queue * chore: enlarge queue * chore: remove verbose log * chore: clippy & format * perf: tuning * chore: gracefully terminate worker (shm version) * fix: ExpertRegistry route mistake after rebalance action (#1) * docs: add some comments for dispatcher logic * feat: add new config for Commuicate Backend (grpc && shm) feat: add uniform get_registry for both backend * feat: set node to deactivate in db when node exit, avoiding wrong info used by schedule * chore: clippy & format * chore: reorganize shmq module * feat: rdma implementation * feat: can successfully establish connection * test: example for rdmaQueue * feat: impl rdma queue into registry and state service * Merge branch 'testing' into perf/rdma * feat: rdma runable * refactor: change write logic from "read remote" to "write remote" feat: add sleep for controller side after rdma connection established. * feat: add some debug info * feat: add interface to RdmaBytes for real lenth feat: rdma will only send real data to the remote * clippy & format --------- Co-authored-by: Yip Coekjan <cn_yzr@qq.com>	9 个月前
ek-benchmark	perf: use ggml operators to optimize cpu ffn forwarding (#94) * perf: use ggml operators to optimize cpu ffn forwarding * perf: supports bf16 on ggml backend * chore: make clippy happy * chore: align the types * chore: tuning & fix serialization * fix: fix padding and context size * feat: allow dropping cache after loading expert backend * chore: statically link ggml * feat: allocating tensor data from rust side * feat: allow specifying computation backend * chore: clippy & format * chore: tuning * chore: delete unused feature flags * chore: remove ggml-cuda * fix: ggml-cpu.h includes ggml.h * perf: single thread for better throughput	10 个月前
ek-cli	chore: set tonic blocking threads via env	9 个月前
ek-computation	fix: Make Rdma connection establishment stable (#107) * feat: change is_connect judge logic * feat: change connection establish logic * feat: simplify rdma connection establish procedure * feat: RdmaEndpointServer shutdown after connection established. * chore: make compiler happy * feat: add retry logic for rdma connection establish * chore: using url:Url for robust url parsing * feat: add proper disconnect logic for connection rebuild feat: RdmaEndpointServer looped for connection rebuild feat: add proper close logic when disconnect from controller feat: graceful shutdown for rdmaEndpointServer * chore: cargo fmt --all * chore: detailed prepared_qp build error * chore: remove unnecessary _ for some fields in RdmaQueue * chore: remove unused variables	7 个月前
ek-db	feat: add transformer mixtral (#92) * feat: add transformer mixtral example * feat: fully integrates mixtral models --------- Co-authored-by: Yip Coekjan <cn_yzr@qq.com>	9 个月前
ek-ggml	fix: add lib64 search path for ek-ggml static linking on kylin Linux Add cargo rustc-link-search for $dst/lib64 in ek-ggml/build.rs so static ggml archives can be found on systems that install libraries under lib64. Refs #1	2 个月前
ek-integration	feat: add transformer mixtral (#92) * feat: add transformer mixtral example * feat: fully integrates mixtral models --------- Co-authored-by: Yip Coekjan <cn_yzr@qq.com>	9 个月前
ek-proto	fix: Make Rdma connection establishment stable (#107) * feat: change is_connect judge logic * feat: change connection establish logic * feat: simplify rdma connection establish procedure * feat: RdmaEndpointServer shutdown after connection established. * chore: make compiler happy * feat: add retry logic for rdma connection establish * chore: using url:Url for robust url parsing * feat: add proper disconnect logic for connection rebuild feat: RdmaEndpointServer looped for connection rebuild feat: add proper close logic when disconnect from controller feat: graceful shutdown for rdmaEndpointServer * chore: cargo fmt --all * chore: detailed prepared_qp build error * chore: remove unnecessary _ for some fields in RdmaQueue * chore: remove unused variables	7 个月前
ek-solution	Minor fix for running deepseek-v3 671B (#25) * feat: add torch install role * feat: convert safetensor to torch tensor without copy * dev: update ansible settings * chore: tweak rust grpc settings and coding style * dev: tweak python settings * fix: remove absolute path * feat: auto reconnect to controller * feat: use rwlock in worker * dev: load worker on start Signed-off-by: Liu Hancheng <cn_lhc@qq.com> * dev: auto reconnect * doc: update readme.md --------- Signed-off-by: Liu Hancheng <cn_lhc@qq.com>	1 年前
.dockerignore	build: add init docker file	1 年前
.gitattributes	test: add test resources	1 年前
.gitignore	fix: typo in torch integration and small enhancement	1 年前
.gitmodules	perf: use ggml operators to optimize cpu ffn forwarding (#94) * perf: use ggml operators to optimize cpu ffn forwarding * perf: supports bf16 on ggml backend * chore: make clippy happy * chore: align the types * chore: tuning & fix serialization * fix: fix padding and context size * feat: allow dropping cache after loading expert backend * chore: statically link ggml * feat: allocating tensor data from rust side * feat: allow specifying computation backend * chore: clippy & format * chore: tuning * chore: delete unused feature flags * chore: remove ggml-cuda * fix: ggml-cpu.h includes ggml.h * perf: single thread for better throughput	10 个月前
.lfsconfig	fix: final try of .lfsconfig	1 年前
.python-version	chore: optimize dependencies (#76) * chore: optimize deps * chore: remove unused deps * chore: mark `ek-cli` as default member	10 个月前
Cargo.lock	fix: Make Rdma connection establishment stable (#107) * feat: change is_connect judge logic * feat: change connection establish logic * feat: simplify rdma connection establish procedure * feat: RdmaEndpointServer shutdown after connection established. * chore: make compiler happy * feat: add retry logic for rdma connection establish * chore: using url:Url for robust url parsing * feat: add proper disconnect logic for connection rebuild feat: RdmaEndpointServer looped for connection rebuild feat: add proper close logic when disconnect from controller feat: graceful shutdown for rdmaEndpointServer * chore: cargo fmt --all * chore: detailed prepared_qp build error * chore: remove unnecessary _ for some fields in RdmaQueue * chore: remove unused variables	7 个月前
Cargo.toml	fix: Make Rdma connection establishment stable (#107) * feat: change is_connect judge logic * feat: change connection establish logic * feat: simplify rdma connection establish procedure * feat: RdmaEndpointServer shutdown after connection established. * chore: make compiler happy * feat: add retry logic for rdma connection establish * chore: using url:Url for robust url parsing * feat: add proper disconnect logic for connection rebuild feat: RdmaEndpointServer looped for connection rebuild feat: add proper close logic when disconnect from controller feat: graceful shutdown for rdmaEndpointServer * chore: cargo fmt --all * chore: detailed prepared_qp build error * chore: remove unnecessary _ for some fields in RdmaQueue * chore: remove unused variables	7 个月前
LICENSE	doc: update readme	1 年前
README.md	Fix typo in README.md (#97)	7 个月前
buf.yaml	feat: support onnxruntime #30 (#38) * feat: onnx export from rust * dev: add onnx supporting * fix: benchmark script for onnxruntime * feat: ek-cli for exporting onnx file * feat: support onnxruntime backend in benchmark * dev: fix merge conflicts * fix: typo	1 年前
pyproject.toml	chore: fix uv deps & enhance qwen3 integration (device & response) (#78) * chore: fix uv deps & enhance qwen3 integration (device & response) * Update ek-integration/expertkit_torch/expertkit_torch/models/qwen3_moe.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: fix copilot suggestion --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	10 个月前
ruff.toml	tool: introduce ruff to standardize py style	1 年前
uv.lock	chore: fix uv deps & enhance qwen3 integration (device & response) (#78) * chore: fix uv deps & enhance qwen3 integration (device & response) * Update ek-integration/expertkit_torch/expertkit_torch/models/qwen3_moe.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: fix copilot suggestion --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	10 个月前

自动翻译

Expert Kit：面向 MoE 大语言模型推理的分布式专家中心框架

Caution

早期开发中。本项目目前处于概念验证演示阶段，正在积极开发中。它不适合生产环境使用，可能包含重大错误、安全漏洞和意外行为。我们感谢社区在我们继续构建和完善此项目过程中提供的反馈和贡献。

关于

Expert Kit（EK） 是一个用于可扩展 MoE（混合专家）大语言模型推理的高性能框架。EK 的愿景是在异构硬件（如 CPU 和 GPU）上通过通用网络（如 PCIe、TCP、RDMA）提供高效的专家并行（EP）基础，从而实现轻松部署和细粒度的专家级扩展。

EK 采用专家 - 注意力（E/A）分离架构，使 MoE 大语言模型能够在由 x 个 CPU 和 y 个 GPU 组成的分布式环境中高效部署。 E/A 分离背后的动机源于我们的观察：在现代 MoE 大语言模型中，专家参数占模型大小的绝大部分（例如，在 DeepSeek - V3 中超过 90%）。通过解耦专家模块并将其部署在分布式 GPU 和 CPU 上，EK 充分利用了分布式内存和存储系统的高带宽和大容量。

https://github.com/user-attachments/assets/9f1f5b23-28fe-44cf-b592-2f6ad0ad4dad

快速开始

以下是一些教程，帮助您快速开始使用 Expert Kit。

DeepSeek - tiny：一个采用 DeepSeek - V3 架构且参数数量较少的定制 MoE 模型，专为快速评估和测试 Expert Kit 框架而设计。
DeepSeek - V3：使用 Expert Kit 运行 DeepSeek - V3 模型的演示，展示了该框架处理大规模 MoE 模型的能力。
Qwen3 - 30B - A3B：使用 Expert Kit 运行 Qwen3 - 30B - A3B 模型的演示，展示了该框架处理实际 MoE 模型的能力。

核心特性

低成本部署：支持分布式部署及 GPU 与 CPU 混合部署。
细粒度专家级可扩展性：提供注意力与专家的独立扩展能力，并可根据需求对热门专家进行动态扩展

性能表现

模型	吞吐量（tokens/s）	环境配置
DeepSeek-V3 671B W8A16	14.26	1xNvidia 4090(24G) + 5xAMD EPYC 7302
Qwen3-MoE-30B FP16	36.38	1xNvidia A10(24G) + 1xAMD EPYC 7302 + 1xKunpeng 920

仓库结构

ek-computation：执行调度（前端）和计算（后端）任务。
ek-db：支持细粒度的专家权重注册与加载。
ek-benchmark：包含多个微基准测试，助您了解性能表现。
ek-solution：包含多个快速搭建运行集群的方案。

路线图

核心功能

前端请求调度
- 简易执行器
- 可扩展执行器
- 调度接口
后端专家计算引擎
- pytorch
- onnxruntime
- candle
与现有框架集成以进行注意力计算
- pytorch
- vLLM
前端与后端间的传输通道
- gRPC
- RDMA
- DSM

联系我们

如有任何问题，请加入我们的讨论：https://expert-kit.zulipchat.com/，或提交新的 issues。

许可协议

主要许可：本项目整体采用 GNU GPL 3.0 许可。
第三方组件：
- 第三方组件的许可和版权声明位于组件代码目录旁。
- 包含以下组件：
  - DeepSeek-V3（代码/补充材料）：位于 ek-integration/expertkit-torch/expertkit-torch/models/deepseek_v3/。此代码采用 DeepSeek 许可协议 v1.0 和 MIT 许可。请注意，相关 DeepSeek 模型的使用受 DeepSeek 许可协议 v1.0 的附件 A 中详细说明的使用限制约束。
  - Qwen3-MoE：位于 ek-integration/expertkit-torch/expertkit-torch/models/。此代码采用 Apache License Version 2.0 许可。
合规性：所有第三方组件均按照其原始许可条款使用。

项目介绍