f8e1cab8创建于 3月28日历史提交

文件	最后提交记录	最后更新时间
archive	doc Tools工具扫描问题修改 Co-authored-by: gitee-yanglulu<yanglulul@h-partners.com> # message auto-generated for no-merge-commit merge: !3432 merge master into master doc Tools工具扫描问题修改 Created-by: gitee-yanglulu Commit-by: gitee-yanglulu Merged-by: cann-robot Description: doc Tools工具扫描问题修改 See merge request: cann/ops-transformer!3432	2 个月前
csrc	add mhc Co-authored-by: ElevenLiu<liuzhiwen@autokernel.cn> # message auto-generated for no-merge-commit merge: !1589 merge master into master add mhc Created-by: ElevenLiu Commit-by: LiuEleven;ElevenLiu Merged-by: cann-robot Description: ## 描述面向昇腾 NPU 的 mHC（Manifold-Constrained Hyper-Connections，流形约束超连接）算子 AscendC 实现，此算子由智子芯元 KernelCAT 智能体生成。 ## 关联的Issue 关联Issue #763 ## 测试 ```bash # C++ cd build && LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH ./test_multi_dtype # Python LD_LIBRARY_PATH=./build/lib:$LD_LIBRARY_PATH python mhc_pre_ops.py ``` ## 使用方式 ```python import mhc_pre_ext x = torch.randn(B * N, S, D, device='npu') # [batch*streams, seq, dim] h = torch.randn(N, device='npu') # [streams] out = mhc_pre_ext.forward(x, h) # [batch, seq, dim] ``` ## 性能（对比 torch.einsum, Ascend 910B2） \| 算子 \| 加速比 \| \|----------\|--------\| \| mhc_pre \| 24x ~ 52x \| \| mhc_post \| 2x ~ 5x \| \| mhc_res \| 24x ~ 50x \| ## 文档更新更新了README.md文件 ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [x] 文档更新 - [x] 其他，请描述：新增开源贡献算子 See merge request: cann/ops-transformer!1589	3 个月前
docs	doc Tools工具扫描问题修改 Co-authored-by: gitee-yanglulu<yanglulul@h-partners.com> # message auto-generated for no-merge-commit merge: !3432 merge master into master doc Tools工具扫描问题修改 Created-by: gitee-yanglulu Commit-by: gitee-yanglulu Merged-by: cann-robot Description: doc Tools工具扫描问题修改 See merge request: cann/ops-transformer!3432	2 个月前
kernel	add mhc Co-authored-by: ElevenLiu<liuzhiwen@autokernel.cn> # message auto-generated for no-merge-commit merge: !1589 merge master into master add mhc Created-by: ElevenLiu Commit-by: LiuEleven;ElevenLiu Merged-by: cann-robot Description: ## 描述面向昇腾 NPU 的 mHC（Manifold-Constrained Hyper-Connections，流形约束超连接）算子 AscendC 实现，此算子由智子芯元 KernelCAT 智能体生成。 ## 关联的Issue 关联Issue #763 ## 测试 ```bash # C++ cd build && LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH ./test_multi_dtype # Python LD_LIBRARY_PATH=./build/lib:$LD_LIBRARY_PATH python mhc_pre_ops.py ``` ## 使用方式 ```python import mhc_pre_ext x = torch.randn(B * N, S, D, device='npu') # [batch*streams, seq, dim] h = torch.randn(N, device='npu') # [streams] out = mhc_pre_ext.forward(x, h) # [batch, seq, dim] ``` ## 性能（对比 torch.einsum, Ascend 910B2） \| 算子 \| 加速比 \| \|----------\|--------\| \| mhc_pre \| 24x ~ 52x \| \| mhc_post \| 2x ~ 5x \| \| mhc_res \| 24x ~ 50x \| ## 文档更新更新了README.md文件 ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [x] 文档更新 - [x] 其他，请描述：新增开源贡献算子 See merge request: cann/ops-transformer!1589	3 个月前
test	add mhc Co-authored-by: ElevenLiu<liuzhiwen@autokernel.cn> # message auto-generated for no-merge-commit merge: !1589 merge master into master add mhc Created-by: ElevenLiu Commit-by: LiuEleven;ElevenLiu Merged-by: cann-robot Description: ## 描述面向昇腾 NPU 的 mHC（Manifold-Constrained Hyper-Connections，流形约束超连接）算子 AscendC 实现，此算子由智子芯元 KernelCAT 智能体生成。 ## 关联的Issue 关联Issue #763 ## 测试 ```bash # C++ cd build && LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH ./test_multi_dtype # Python LD_LIBRARY_PATH=./build/lib:$LD_LIBRARY_PATH python mhc_pre_ops.py ``` ## 使用方式 ```python import mhc_pre_ext x = torch.randn(B * N, S, D, device='npu') # [batch*streams, seq, dim] h = torch.randn(N, device='npu') # [streams] out = mhc_pre_ext.forward(x, h) # [batch, seq, dim] ``` ## 性能（对比 torch.einsum, Ascend 910B2） \| 算子 \| 加速比 \| \|----------\|--------\| \| mhc_pre \| 24x ~ 52x \| \| mhc_post \| 2x ~ 5x \| \| mhc_res \| 24x ~ 50x \| ## 文档更新更新了README.md文件 ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [x] 文档更新 - [x] 其他，请描述：新增开源贡献算子 See merge request: cann/ops-transformer!1589	3 个月前
CMakeLists.txt	add mhc Co-authored-by: ElevenLiu<liuzhiwen@autokernel.cn> # message auto-generated for no-merge-commit merge: !1589 merge master into master add mhc Created-by: ElevenLiu Commit-by: LiuEleven;ElevenLiu Merged-by: cann-robot Description: ## 描述面向昇腾 NPU 的 mHC（Manifold-Constrained Hyper-Connections，流形约束超连接）算子 AscendC 实现，此算子由智子芯元 KernelCAT 智能体生成。 ## 关联的Issue 关联Issue #763 ## 测试 ```bash # C++ cd build && LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH ./test_multi_dtype # Python LD_LIBRARY_PATH=./build/lib:$LD_LIBRARY_PATH python mhc_pre_ops.py ``` ## 使用方式 ```python import mhc_pre_ext x = torch.randn(B * N, S, D, device='npu') # [batch*streams, seq, dim] h = torch.randn(N, device='npu') # [streams] out = mhc_pre_ext.forward(x, h) # [batch, seq, dim] ``` ## 性能（对比 torch.einsum, Ascend 910B2） \| 算子 \| 加速比 \| \|----------\|--------\| \| mhc_pre \| 24x ~ 52x \| \| mhc_post \| 2x ~ 5x \| \| mhc_res \| 24x ~ 50x \| ## 文档更新更新了README.md文件 ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [x] 文档更新 - [x] 其他，请描述：新增开源贡献算子 See merge request: cann/ops-transformer!1589	3 个月前
README.md	add mhc Co-authored-by: ElevenLiu<liuzhiwen@autokernel.cn> # message auto-generated for no-merge-commit merge: !1589 merge master into master add mhc Created-by: ElevenLiu Commit-by: LiuEleven;ElevenLiu Merged-by: cann-robot Description: ## 描述面向昇腾 NPU 的 mHC（Manifold-Constrained Hyper-Connections，流形约束超连接）算子 AscendC 实现，此算子由智子芯元 KernelCAT 智能体生成。 ## 关联的Issue 关联Issue #763 ## 测试 ```bash # C++ cd build && LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH ./test_multi_dtype # Python LD_LIBRARY_PATH=./build/lib:$LD_LIBRARY_PATH python mhc_pre_ops.py ``` ## 使用方式 ```python import mhc_pre_ext x = torch.randn(B * N, S, D, device='npu') # [batch*streams, seq, dim] h = torch.randn(N, device='npu') # [streams] out = mhc_pre_ext.forward(x, h) # [batch, seq, dim] ``` ## 性能（对比 torch.einsum, Ascend 910B2） \| 算子 \| 加速比 \| \|----------\|--------\| \| mhc_pre \| 24x ~ 52x \| \| mhc_post \| 2x ~ 5x \| \| mhc_res \| 24x ~ 50x \| ## 文档更新更新了README.md文件 ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [x] 文档更新 - [x] 其他，请描述：新增开源贡献算子 See merge request: cann/ops-transformer!1589	3 个月前
mhc_post_ops.py	add mhc Co-authored-by: ElevenLiu<liuzhiwen@autokernel.cn> # message auto-generated for no-merge-commit merge: !1589 merge master into master add mhc Created-by: ElevenLiu Commit-by: LiuEleven;ElevenLiu Merged-by: cann-robot Description: ## 描述面向昇腾 NPU 的 mHC（Manifold-Constrained Hyper-Connections，流形约束超连接）算子 AscendC 实现，此算子由智子芯元 KernelCAT 智能体生成。 ## 关联的Issue 关联Issue #763 ## 测试 ```bash # C++ cd build && LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH ./test_multi_dtype # Python LD_LIBRARY_PATH=./build/lib:$LD_LIBRARY_PATH python mhc_pre_ops.py ``` ## 使用方式 ```python import mhc_pre_ext x = torch.randn(B * N, S, D, device='npu') # [batch*streams, seq, dim] h = torch.randn(N, device='npu') # [streams] out = mhc_pre_ext.forward(x, h) # [batch, seq, dim] ``` ## 性能（对比 torch.einsum, Ascend 910B2） \| 算子 \| 加速比 \| \|----------\|--------\| \| mhc_pre \| 24x ~ 52x \| \| mhc_post \| 2x ~ 5x \| \| mhc_res \| 24x ~ 50x \| ## 文档更新更新了README.md文件 ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [x] 文档更新 - [x] 其他，请描述：新增开源贡献算子 See merge request: cann/ops-transformer!1589	3 个月前
setup.py	add mhc Co-authored-by: ElevenLiu<liuzhiwen@autokernel.cn> # message auto-generated for no-merge-commit merge: !1589 merge master into master add mhc Created-by: ElevenLiu Commit-by: LiuEleven;ElevenLiu Merged-by: cann-robot Description: ## 描述面向昇腾 NPU 的 mHC（Manifold-Constrained Hyper-Connections，流形约束超连接）算子 AscendC 实现，此算子由智子芯元 KernelCAT 智能体生成。 ## 关联的Issue 关联Issue #763 ## 测试 ```bash # C++ cd build && LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH ./test_multi_dtype # Python LD_LIBRARY_PATH=./build/lib:$LD_LIBRARY_PATH python mhc_pre_ops.py ``` ## 使用方式 ```python import mhc_pre_ext x = torch.randn(B * N, S, D, device='npu') # [batch*streams, seq, dim] h = torch.randn(N, device='npu') # [streams] out = mhc_pre_ext.forward(x, h) # [batch, seq, dim] ``` ## 性能（对比 torch.einsum, Ascend 910B2） \| 算子 \| 加速比 \| \|----------\|--------\| \| mhc_pre \| 24x ~ 52x \| \| mhc_post \| 2x ~ 5x \| \| mhc_res \| 24x ~ 50x \| ## 文档更新更新了README.md文件 ## 类型标签 <!-- [x] 表示选中 --> - [ ] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [x] 文档更新 - [x] 其他，请描述：新增开源贡献算子 See merge request: cann/ops-transformer!1589	3 个月前

mhc_post AscendC Operator

Broadcast single stream to multiple streams with per-stream scaling (post-connection in mHC).

Formula

output[b * N + n, seq, d] = x[b, seq, d] × h_post[n]

Equivalent einsum: torch.einsum('bsd,n->bnsd', x, h_post).reshape(B*N,S,D)

Notes

h_post is a static weight vector [num_streams], shared across all batches and token positions
This matches the tokenbender/mHC open-source implementation
Weight normalization is handled upstream, not in this operator

Adaptive Strategy

The kernel automatically selects between two parallelization strategies based on shape:

Strategy A: Parallelize over (batch, stream) pairs. Best for small/medium data sizes.
Strategy B: Read input once, write N outputs. Best for large data with large batch.

See docs/performance.md for detailed benchmarks.

Build

source /usr/local/Ascend/ascend-toolkit/set_env.sh

# 1. Build AscendC kernel
mkdir -p build && cd build
cmake .. -DSOC_VERSION=ascend910b2
make -j
cd ..

# 2. Build PyTorch C++ extension
python setup.py build_ext --inplace

Test

source /usr/local/Ascend/ascend-toolkit/set_env.sh

# C++ test
cd build && LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH ./test_multi_dtype

# Python test
LD_LIBRARY_PATH=./build/lib:$LD_LIBRARY_PATH python mhc_post_ops.py

API

# Python (via C++ extension)
import mhc_post_ext
output = mhc_post_ext.forward(x, h_post)  # x on NPU, returns NPU tensor

# Or use wrapper
from mhc_post_ops import mhc_post, mhc_post_einsum
output = mhc_post(x, h_post)

// C++ kernel (auto-selects strategy)
extern "C" void mhc_post_do_fp32(uint32_t blockDim, void* stream,
    uint8_t* input, uint8_t* h_post, uint8_t* output,
    int64_t batch, int64_t seq_len, int64_t dim, int64_t num_streams);

Performance

On Ascend 910B2, mhc_post achieves 2-4x speedup over torch.einsum for most shapes. See docs/performance.md for details.