06a9cce5创建于 4月23日历史提交

文件	最后提交记录	最后更新时间
simt_mean	five example	1 个月前
test	five example	1 个月前
README.md	five example	1 个月前
plan.md	five example	1 个月前
pyproject.toml	five example	1 个月前
requirements.txt	five example	1 个月前
setup.py	five example	1 个月前

mean 迁移说明

1. 算子说明

算子名称：torch.mean
原始实现：pytorch/aten/src/ATen/native/cuda/ReduceMomentKernel.cu
迁移模式：torch_npu
交付形态：独立可安装的 Ascend SIMT 扩展，导入后覆盖 aten::mean.out 的 PrivateUse1 实现
迁移原则：不接受回退到 ACLNN / at::sum(...).div_() / CPU，保持一对一 SIMT reduction 实现

2. 实现摘要

直接覆写 aten::mean.out
functional torch.mean / Tensor.mean / mean.dtype_out 继续走上层 composite / structured 路径落到 mean.out
保留空张量返回 NaN 的设备侧语义
支持任意维组合、keepdim、非 contiguous 输入和非 contiguous out
半精度与 bfloat16 按 CUDA 路径思路在 float 中累加后再写回

3. 目录结构

ported-ops/mean/
├── pyproject.toml
├── requirements.txt
├── setup.py
├── README.md
├── plan.md
├── simt_mean
│   ├── __init__.py
│   └── csrc
│       ├── mean_bindings.asc
│       ├── mean_simt.h
│       └── simt
│           └── mean_simt.asc
└── test
    └── test_mean.py

4. 构建与验证

source /usr/local/Ascend/cann/set_env.sh
conda activate cuda2Simt
cd ported-ops/mean
python -m pip install -e . --no-build-isolation
python -m unittest discover -s test -p 'test_mean.py'

5. 已知限制

当前 bisheng --enable-simt toolchain 在本环境下无法为 double / complex128 的 mean reduction 生成稳定设备代码；本次交付保持了 float32、float16、bfloat16 和 complex64 的直接 SIMT 实现，没有回退到 host / ACLNN