文件最后提交记录最后更新时间
[bugfix]解决测试用例导包失败的问题 Co-authored-by: mazhixin00_00<mazhixin7@huawei.com> # message auto-generated for no-merge-commit merge: !199 merge init into dev [bugfix]解决测试用例导包失败的问题 Created-by: mazhixin00_00 Commit-by: mazhixin00_00 Merged-by: ascend-robot Description: <!-- PR描述模板更新日期:20251224 --> # Which issue(s) this PR fixes or accomplishes 有些环境,跑测试用例导包失败的问题 ![image.png](https://raw.gitcode.com/user-images/assets/8476587/b7f11b69-dc5f-43f3-839f-237a02aa5125/image.png 'image.png') # Test Plan 全量测试 # Test Report ![image.png](https://raw.gitcode.com/user-images/assets/8476587/a99938ad-ade8-42e7-bd08-4a5aa53c78b9/image.png 'image.png') See merge request: Ascend/MindIE-SD!1993 个月前
[Feature][compilation]Fix timing measurement and add MulAdd fusion pattern Co-authored-by: blian6<bin.lian@outlook.com> Co-authored-by: blian<lianbin@huawei.com> # message auto-generated for no-merge-commit merge: !270 merge dev into dev [Feature][compilation]Add muls_add Triton kernel fusion pattern, benchmark utilities, and framework fixes Created-by: blian Commit-by: blian6;blian Merged-by: ascend-robot Description: # Which issue(s) this PR fixes or accomplishes 修复compilation模块的benchmark方法,并添加triton算子的感知方案 # Purpose 本 PR 包含两部分工作: ## Part A: muls_add Triton kernel 融合 参考 vllm-ascend 的 muls_add 融合方案,将 mul_add_pattern.py 中的代数优化替换为 kernel 级融合: - **新增** mindiesd/layers/triton_utils.py — Triton Ascend 辅助函数,含 _TRITON_ON_ASCEND 运行时检测 - **新增** mindiesd/layers/muls_add.pymuls_add(x, y, scale) 逐元素融合 kernel(Triton 优先 + torch 自动降级),通过 torch.library.custom_op 注册为 torch.ops.mindiesd.muls_add - **重构** mindiesd/compilation/patterns/mul_add_pattern.py — 工厂函数 create(dtype, scale) + PatternBase 静态接口;pattern 从 mul(a,c)+mul(b,c) 改为 x * scale + y;replacement 从 mul(add(a,b),c) 改为 muls_add(x, y, scale) kernel 调用 - **修改** mindiesd/layers/__init__.py — 导出 muls_add - **新增** tests/layers/test_muls_add.py — 13 项 kernel 单元测试(dtype/shape/scale/边界/inplace/device 保真性) - **修改** tests/compilation/patterns/test_mul_add_pattern.py — 适配 2-输入 + scale pattern,仅保留正确性断言(cosine similarity ≥ 2^-7) ## Part B: benchmark 基础设施与框架修复 - **新增** tests/compilation/test_bench_utils.py — 公共 benchmark 函数(warmup + sync + 多迭代 + 后5平均),替代原有不准确的 time.perf_counter() 单次测量 - **修复** 4 个 pattern 测试文件接入新 benchmark,断言改为 compiled_time < original_time: - tests/compilation/patterns/test_adelayernorm_pattern.py - tests/compilation/patterns/test_gelu_pattern.py - tests/compilation/patterns/test_rmsnorm_pattern.py - tests/compilation/patterns/test_rope_pattern.py - **修复** passes/__init__.pythreading.Lock 误用(threading.Lock()Lock 实例) - FusionPatterns 新增 enable_mul_add 开关 - passes/__init__.py 新增末尾换行符 - **移除** tests/compilation/test_backend.py(SamplePass 为数学恒等变换,不支持 CPU 运行) # Test Plan ### Kernel 层测试 tests/layers/test_muls_add.py — 13 项,覆盖 float32/float16/bfloat16 三种 dtype × 5 种 shape × 7 种 scale 组合,边界值(scale=0/1/-1),inplace 安全性,device/dtype 保真性,多次调用一致性。 ### Pattern 集成测试 tests/compilation/patterns/test_mul_add_pattern.py — 4 项(bfloat16, float16, float32, 32×8192),验证 torch.compile + MindieSDBackend 全链路,cosine similarity ≥ 2^-7。 ### Benchmark + 注册测试 - tests/compilation/test_pattern_registration.pyenable_mul_add 开关测试 - tests/compilation/test_bench_utils.py — benchmark 工具函数 - tests/compilation/patterns/test_adelayernorm_pattern.py - tests/compilation/patterns/test_gelu_pattern.py - tests/compilation/patterns/test_rmsnorm_pattern.py - tests/compilation/patterns/test_rope_pattern.py ### 远端验证命令 bash python -m pytest tests/layers/test_muls_add.py -v python -m pytest tests/compilation/patterns/ -v python -m pytest tests/compilation/test_pattern_registration.py -v Test Report 环境 Ascend 910B (175.99.1.3), torch 2.8.0, torch_npu 2.8.0, triton_ascend 3.2.0 Kernel 单元测试 13/13 passed, 15 subtests Pattern 集成测试 4/4 passed Benchmark + 注册测试 9/9 passed Triton 硬件 40 vector cores, 20 AI cores See merge request: Ascend/MindIE-SD!2701 个月前
[Feature][compilation]Fix timing measurement and add MulAdd fusion pattern Co-authored-by: blian6<bin.lian@outlook.com> Co-authored-by: blian<lianbin@huawei.com> # message auto-generated for no-merge-commit merge: !270 merge dev into dev [Feature][compilation]Add muls_add Triton kernel fusion pattern, benchmark utilities, and framework fixes Created-by: blian Commit-by: blian6;blian Merged-by: ascend-robot Description: # Which issue(s) this PR fixes or accomplishes 修复compilation模块的benchmark方法,并添加triton算子的感知方案 # Purpose 本 PR 包含两部分工作: ## Part A: muls_add Triton kernel 融合 参考 vllm-ascend 的 muls_add 融合方案,将 mul_add_pattern.py 中的代数优化替换为 kernel 级融合: - **新增** mindiesd/layers/triton_utils.py — Triton Ascend 辅助函数,含 _TRITON_ON_ASCEND 运行时检测 - **新增** mindiesd/layers/muls_add.pymuls_add(x, y, scale) 逐元素融合 kernel(Triton 优先 + torch 自动降级),通过 torch.library.custom_op 注册为 torch.ops.mindiesd.muls_add - **重构** mindiesd/compilation/patterns/mul_add_pattern.py — 工厂函数 create(dtype, scale) + PatternBase 静态接口;pattern 从 mul(a,c)+mul(b,c) 改为 x * scale + y;replacement 从 mul(add(a,b),c) 改为 muls_add(x, y, scale) kernel 调用 - **修改** mindiesd/layers/__init__.py — 导出 muls_add - **新增** tests/layers/test_muls_add.py — 13 项 kernel 单元测试(dtype/shape/scale/边界/inplace/device 保真性) - **修改** tests/compilation/patterns/test_mul_add_pattern.py — 适配 2-输入 + scale pattern,仅保留正确性断言(cosine similarity ≥ 2^-7) ## Part B: benchmark 基础设施与框架修复 - **新增** tests/compilation/test_bench_utils.py — 公共 benchmark 函数(warmup + sync + 多迭代 + 后5平均),替代原有不准确的 time.perf_counter() 单次测量 - **修复** 4 个 pattern 测试文件接入新 benchmark,断言改为 compiled_time < original_time: - tests/compilation/patterns/test_adelayernorm_pattern.py - tests/compilation/patterns/test_gelu_pattern.py - tests/compilation/patterns/test_rmsnorm_pattern.py - tests/compilation/patterns/test_rope_pattern.py - **修复** passes/__init__.pythreading.Lock 误用(threading.Lock()Lock 实例) - FusionPatterns 新增 enable_mul_add 开关 - passes/__init__.py 新增末尾换行符 - **移除** tests/compilation/test_backend.py(SamplePass 为数学恒等变换,不支持 CPU 运行) # Test Plan ### Kernel 层测试 tests/layers/test_muls_add.py — 13 项,覆盖 float32/float16/bfloat16 三种 dtype × 5 种 shape × 7 种 scale 组合,边界值(scale=0/1/-1),inplace 安全性,device/dtype 保真性,多次调用一致性。 ### Pattern 集成测试 tests/compilation/patterns/test_mul_add_pattern.py — 4 项(bfloat16, float16, float32, 32×8192),验证 torch.compile + MindieSDBackend 全链路,cosine similarity ≥ 2^-7。 ### Benchmark + 注册测试 - tests/compilation/test_pattern_registration.pyenable_mul_add 开关测试 - tests/compilation/test_bench_utils.py — benchmark 工具函数 - tests/compilation/patterns/test_adelayernorm_pattern.py - tests/compilation/patterns/test_gelu_pattern.py - tests/compilation/patterns/test_rmsnorm_pattern.py - tests/compilation/patterns/test_rope_pattern.py ### 远端验证命令 bash python -m pytest tests/layers/test_muls_add.py -v python -m pytest tests/compilation/patterns/ -v python -m pytest tests/compilation/test_pattern_registration.py -v Test Report 环境 Ascend 910B (175.99.1.3), torch 2.8.0, torch_npu 2.8.0, triton_ascend 3.2.0 Kernel 单元测试 13/13 passed, 15 subtests Pattern 集成测试 4/4 passed Benchmark + 注册测试 9/9 passed Triton 硬件 40 vector cores, 20 AI cores See merge request: Ascend/MindIE-SD!2701 个月前
[Feature][compilation]Fix timing measurement and add MulAdd fusion pattern Co-authored-by: blian6<bin.lian@outlook.com> Co-authored-by: blian<lianbin@huawei.com> # message auto-generated for no-merge-commit merge: !270 merge dev into dev [Feature][compilation]Add muls_add Triton kernel fusion pattern, benchmark utilities, and framework fixes Created-by: blian Commit-by: blian6;blian Merged-by: ascend-robot Description: # Which issue(s) this PR fixes or accomplishes 修复compilation模块的benchmark方法,并添加triton算子的感知方案 # Purpose 本 PR 包含两部分工作: ## Part A: muls_add Triton kernel 融合 参考 vllm-ascend 的 muls_add 融合方案,将 mul_add_pattern.py 中的代数优化替换为 kernel 级融合: - **新增** mindiesd/layers/triton_utils.py — Triton Ascend 辅助函数,含 _TRITON_ON_ASCEND 运行时检测 - **新增** mindiesd/layers/muls_add.pymuls_add(x, y, scale) 逐元素融合 kernel(Triton 优先 + torch 自动降级),通过 torch.library.custom_op 注册为 torch.ops.mindiesd.muls_add - **重构** mindiesd/compilation/patterns/mul_add_pattern.py — 工厂函数 create(dtype, scale) + PatternBase 静态接口;pattern 从 mul(a,c)+mul(b,c) 改为 x * scale + y;replacement 从 mul(add(a,b),c) 改为 muls_add(x, y, scale) kernel 调用 - **修改** mindiesd/layers/__init__.py — 导出 muls_add - **新增** tests/layers/test_muls_add.py — 13 项 kernel 单元测试(dtype/shape/scale/边界/inplace/device 保真性) - **修改** tests/compilation/patterns/test_mul_add_pattern.py — 适配 2-输入 + scale pattern,仅保留正确性断言(cosine similarity ≥ 2^-7) ## Part B: benchmark 基础设施与框架修复 - **新增** tests/compilation/test_bench_utils.py — 公共 benchmark 函数(warmup + sync + 多迭代 + 后5平均),替代原有不准确的 time.perf_counter() 单次测量 - **修复** 4 个 pattern 测试文件接入新 benchmark,断言改为 compiled_time < original_time: - tests/compilation/patterns/test_adelayernorm_pattern.py - tests/compilation/patterns/test_gelu_pattern.py - tests/compilation/patterns/test_rmsnorm_pattern.py - tests/compilation/patterns/test_rope_pattern.py - **修复** passes/__init__.pythreading.Lock 误用(threading.Lock()Lock 实例) - FusionPatterns 新增 enable_mul_add 开关 - passes/__init__.py 新增末尾换行符 - **移除** tests/compilation/test_backend.py(SamplePass 为数学恒等变换,不支持 CPU 运行) # Test Plan ### Kernel 层测试 tests/layers/test_muls_add.py — 13 项,覆盖 float32/float16/bfloat16 三种 dtype × 5 种 shape × 7 种 scale 组合,边界值(scale=0/1/-1),inplace 安全性,device/dtype 保真性,多次调用一致性。 ### Pattern 集成测试 tests/compilation/patterns/test_mul_add_pattern.py — 4 项(bfloat16, float16, float32, 32×8192),验证 torch.compile + MindieSDBackend 全链路,cosine similarity ≥ 2^-7。 ### Benchmark + 注册测试 - tests/compilation/test_pattern_registration.pyenable_mul_add 开关测试 - tests/compilation/test_bench_utils.py — benchmark 工具函数 - tests/compilation/patterns/test_adelayernorm_pattern.py - tests/compilation/patterns/test_gelu_pattern.py - tests/compilation/patterns/test_rmsnorm_pattern.py - tests/compilation/patterns/test_rope_pattern.py ### 远端验证命令 bash python -m pytest tests/layers/test_muls_add.py -v python -m pytest tests/compilation/patterns/ -v python -m pytest tests/compilation/test_pattern_registration.py -v Test Report 环境 Ascend 910B (175.99.1.3), torch 2.8.0, torch_npu 2.8.0, triton_ascend 3.2.0 Kernel 单元测试 13/13 passed, 15 subtests Pattern 集成测试 4/4 passed Benchmark + 注册测试 9/9 passed Triton 硬件 40 vector cores, 20 AI cores See merge request: Ascend/MindIE-SD!2701 个月前
test: align ut expectations with current quant behavior Co-authored-by: guowenna1<guowenna1@huawei.com> # message auto-generated for no-merge-commit merge: !373 merge dev into dev test: align ut expectations with current quant behavior Created-by: weixin_44144262 Commit-by: guowenna1 Merged-by: ascend-robot Description: # Purpose 修改在A2上部分用例执行失败的情况 # Test Plan 运行全量ut # Test Report ![image.png](https://raw.gitcode.com/user-images/assets/8476587/a42d2545-533f-4a6b-a86d-2bfcfd8f14ec/image.png 'image.png') See merge request: Ascend/MindIE-SD!3734 天前
test: align ut expectations with current quant behavior Co-authored-by: guowenna1<guowenna1@huawei.com> # message auto-generated for no-merge-commit merge: !373 merge dev into dev test: align ut expectations with current quant behavior Created-by: weixin_44144262 Commit-by: guowenna1 Merged-by: ascend-robot Description: # Purpose 修改在A2上部分用例执行失败的情况 # Test Plan 运行全量ut # Test Report ![image.png](https://raw.gitcode.com/user-images/assets/8476587/a42d2545-533f-4a6b-a86d-2bfcfd8f14ec/image.png 'image.png') See merge request: Ascend/MindIE-SD!3734 天前