文件最后提交记录最后更新时间
[Ops][Feature] Add DeepSeek V4 custom operators (#9228) ## Summary - port DeepSeek V4 custom ops from GDzhu01/vllm-ascend-deepseekv4 - add attention ops: compressor, inplace_partial_rotary_mul, quant_lightning_indexer, quant_lightning_indexer_metadata, sparse_attn_sharedkv, sparse_attn_sharedkv_metadata - add GMM ops: grouped_matmul_swiglu_quant, grouped_matmul_swiglu_quant_weight_nz_tensor_list, grouped_matmul_swiglu_quant_v2 - add MoE ops: hc_post, hc_pre_inv_rms, hc_pre_sinkhorn, moe_gating_top_k_hash, scatter_nd_update_v2 - register torch bindings and meta kernels for the new public APIs ## Source branches - GDzhu01/vllm-ascend-deepseekv4:vllm_ds_uncontigous_018_lf for the first 13 ops - GDzhu01/vllm-ascend-deepseekv4:v4_v0.18.0_0412 for grouped_matmul_swiglu_quant_v2 ## Validation - git diff --check HEAD~1..HEAD - checked that source-branch test artifacts/logs were not included - no local NPU execution run; this is expected for local macOS workspace and CI should cover build validation - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 ## Co-authors Co-authored-by: 1132509010 <1132509010@qq.com> Co-authored-by: ader47 <1661888967@qq.com> Co-authored-by: anakin-wx <1084704046@qq.com> Co-authored-by: anon189Ty <Stari_Falcon@outlook.com> Co-authored-by: ChangminTao <taocm123@qq.com> Co-authored-by: chenchris2 <1349418798@qq.com> Co-authored-by: ChenxiQ <chenxi.qian.cq@outlook.com> Co-authored-by: coder-fny <985619145@qq.com> Co-authored-by: fuzhihong699 <fuzhihong4@huawei.com> Co-authored-by: GDzhu01 <809721801@qq.com> Co-authored-by: goldVitaminC <297780618@qq.com> Co-authored-by: HiC4Sh1e <chenjie137@huawei.com> Co-authored-by: hwhaokun <haokun0405@163.com> Co-authored-by: kirliavc <jlc@pku.edu.cn> Co-authored-by: lcfenglinwan <lcfenglin@qq.com> Co-authored-by: Liexss <924834690@qq.com> Co-authored-by: linfeng-yuan <1102311262@qq.com> Co-authored-by: liuyan190974 <shandaliuyan@163.com> Co-authored-by: LookAround0301 <lixushi@huawei.com> Co-authored-by: maoxx241 <maomaoyu870@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: monologue815 <monologue815@qq.com> Co-authored-by: MosCloud <bwzhang1991@163.com> Co-authored-by: nomewang <nomeyue@outlook.com> Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com> Co-authored-by: pinfa <1819563383@qq.com> Co-authored-by: pjgao <1783198484@qq.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> Co-authored-by: realliujiaxu <realliujiaxu@163.com> Co-authored-by: showMeYourCode1997 <934005226@qq.com> Co-authored-by: SidaoY <1024863041@qq.com> Co-authored-by: slippersss <slippersss@126.com> Co-authored-by: Toneymiller <1476209578@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com> Co-authored-by: weinachuan <1173732899@qq.com> Co-authored-by: WithHades <244036962@qq.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Co-authored-by: WOE-Y <876362620@qq.com> Co-authored-by: wxh571001500 <571001500@qq.com> Co-authored-by: wxsIcey <1790571317@qq.com> Co-authored-by: xmpp777 <yangming2@huawei.com> Co-authored-by: yiz-liu <liu_yizhou@outlook.com> Co-authored-by: yzylxyypl <yzylxyypl@gmail.com> Co-authored-by: zcc-zjut <zcczxy2019@163.com> Co-authored-by: zhangsicheng5 <zhangsicheng5@huawei.com> Co-authored-by: zhaozx-cn <zhaozx2116@163.com> Co-authored-by: zhenwenqi_2024 <zhenwenqi_2022@qq.com> Co-authored-by: ZT-AIA <1028681969@qq.com> Signed-off-by: maoxx241 <maomaoyu870@gmail.com>11 天前
[Feature][Ops] Add A5 custom operator build support (#9271) ### What this PR does / why we need it? This PR adds the A5 / ascend950 custom operator build and binding support needed by the DeepSeek V4 A5 path. Changes included: - Updates the ascend950 CUSTOM_OPS_ARRAY in csrc/build_aclnn.sh to match the A5 ACLNN branch operator list. - Adds A5-specific custom operator directories for indexer_compress_epilog, indexer_compress_epilog_v2, kv_compress_epilog, kv_quant_sparse_attn_sharedkv, kv_quant_sparse_attn_sharedkv_metadata, load_index_kv_cache, hc_pre, and swiglu_group_quant. - Wires torch and meta registrations for the new A5 operators, including npu_hc_pre_v2 and grouped_matmul_swiglu_quant_weight_nz. - Skips the direct vllm_ascend_kernels target for ascend950 and uses VLLM_ENABLE_ATB_AND_DIRECT_KERNELS to guard direct-kernel includes, schemas, implementations, and meta registrations. This keeps ascend950 and 310P import-safe when those direct kernels are not built. - Maps ascend950 to the newer Ascend950PR_9599 CANN platform name used by the new build framework. - Updates quant_lightning_indexer for A5 PA cache views: it keeps the existing stride / scale_stride schema, avoids forcing 950 key/key-scale tensors contiguous, and uses the passed strides in the arch35 PA offset path. A3 conflict-sensitive note: - The A5 source branch also contains changes under shared A3 operator paths such as compressor, sparse_attn_sharedkv, hc_pre_sinkhorn, hc_pre_inv_rms, hc_post, inplace_partial_rotary_mul, moe_gating_top_k_hash, and fused-MoE Python code. - This PR intentionally does not overwrite those shared implementations. For shared operator names that remain in the ascend950 build list, it uses the current main-branch implementation and only adds the A5-specific operator directories missing from main. - quant_lightning_indexer is the intentional shared-path exception because A5 needs dim0-stride-aware PA cache access. - quant_lightning_indexer_metadata stays on the mainline Ascend950 SoC-version behavior and does not add legacy Ascend910_95 compatibility. - compressor is intentionally not changed in this PR and should be handled by the operator owners separately. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: maoxx241 <maomaoyu870@gmail.com>10 天前
Refactor the ops PyTorch adapter,cleanup for csrc/torch_binding.cpp (#6732) ### What this PR does / why we need it? Refactor the ops PyTorch adapter,cleanup for csrc/torch_binding.cpp, more details see https://github.com/vllm-project/vllm-ascend/issues/6486 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? install the new package to test the new modification, here is the result: - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007 --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: luomin2005 <luomin2005@huawei.com> Co-authored-by: liziyu <56102866+liziyu179@users.noreply.github.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>3 个月前
[Feature][Ops] Add A5 custom operator build support (#9271) ### What this PR does / why we need it? This PR adds the A5 / ascend950 custom operator build and binding support needed by the DeepSeek V4 A5 path. Changes included: - Updates the ascend950 CUSTOM_OPS_ARRAY in csrc/build_aclnn.sh to match the A5 ACLNN branch operator list. - Adds A5-specific custom operator directories for indexer_compress_epilog, indexer_compress_epilog_v2, kv_compress_epilog, kv_quant_sparse_attn_sharedkv, kv_quant_sparse_attn_sharedkv_metadata, load_index_kv_cache, hc_pre, and swiglu_group_quant. - Wires torch and meta registrations for the new A5 operators, including npu_hc_pre_v2 and grouped_matmul_swiglu_quant_weight_nz. - Skips the direct vllm_ascend_kernels target for ascend950 and uses VLLM_ENABLE_ATB_AND_DIRECT_KERNELS to guard direct-kernel includes, schemas, implementations, and meta registrations. This keeps ascend950 and 310P import-safe when those direct kernels are not built. - Maps ascend950 to the newer Ascend950PR_9599 CANN platform name used by the new build framework. - Updates quant_lightning_indexer for A5 PA cache views: it keeps the existing stride / scale_stride schema, avoids forcing 950 key/key-scale tensors contiguous, and uses the passed strides in the arch35 PA offset path. A3 conflict-sensitive note: - The A5 source branch also contains changes under shared A3 operator paths such as compressor, sparse_attn_sharedkv, hc_pre_sinkhorn, hc_pre_inv_rms, hc_post, inplace_partial_rotary_mul, moe_gating_top_k_hash, and fused-MoE Python code. - This PR intentionally does not overwrite those shared implementations. For shared operator names that remain in the ascend950 build list, it uses the current main-branch implementation and only adds the A5-specific operator directories missing from main. - quant_lightning_indexer is the intentional shared-path exception because A5 needs dim0-stride-aware PA cache access. - quant_lightning_indexer_metadata stays on the mainline Ascend950 SoC-version behavior and does not add legacy Ascend910_95 compatibility. - compressor is intentionally not changed in this PR and should be handled by the operator owners separately. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: maoxx241 <maomaoyu870@gmail.com>10 天前
[Ops][BugFix] Reuse common tiling_base for custom ops (#9103) ## Summary - remove duplicated host-side tiling_base and error_log headers from imported custom ops - include the common tiling_base headers directly from affected host tiling files - keep CeilDiv/CeilAlign in the common host tiling utility and expose them through the common error_log compatibility header - keep kernel-side local error_log headers untouched - include csrc changes in the 310P light-test tracker and map ascend310p builds to arch22 - update CANN 9.0 Ascend950 SOC naming from ascend910_95/ASCEND910_95 to ascend950/ASCEND950 ## Validation - git diff --check - git diff --cached --check - bash -n csrc/build.sh - bash -n csrc/build_aclnn.sh - python -m py_compile csrc/cmake/scripts/util/const_var.py csrc/cmake/scripts/util/opdesc_parser.py csrc/scripts/util/const_var.py - rg -n "ASCEND910_95|ascend910_95|Ascend910_9599" . Fixes the custom-op build issues seen in PR #9066 CI logs and avoids per-operator tiling_base copies. - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>16 天前
[Ops][Feature] Add DeepSeek V4 custom operators (#9228) ## Summary - port DeepSeek V4 custom ops from GDzhu01/vllm-ascend-deepseekv4 - add attention ops: compressor, inplace_partial_rotary_mul, quant_lightning_indexer, quant_lightning_indexer_metadata, sparse_attn_sharedkv, sparse_attn_sharedkv_metadata - add GMM ops: grouped_matmul_swiglu_quant, grouped_matmul_swiglu_quant_weight_nz_tensor_list, grouped_matmul_swiglu_quant_v2 - add MoE ops: hc_post, hc_pre_inv_rms, hc_pre_sinkhorn, moe_gating_top_k_hash, scatter_nd_update_v2 - register torch bindings and meta kernels for the new public APIs ## Source branches - GDzhu01/vllm-ascend-deepseekv4:vllm_ds_uncontigous_018_lf for the first 13 ops - GDzhu01/vllm-ascend-deepseekv4:v4_v0.18.0_0412 for grouped_matmul_swiglu_quant_v2 ## Validation - git diff --check HEAD~1..HEAD - checked that source-branch test artifacts/logs were not included - no local NPU execution run; this is expected for local macOS workspace and CI should cover build validation - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 ## Co-authors Co-authored-by: 1132509010 <1132509010@qq.com> Co-authored-by: ader47 <1661888967@qq.com> Co-authored-by: anakin-wx <1084704046@qq.com> Co-authored-by: anon189Ty <Stari_Falcon@outlook.com> Co-authored-by: ChangminTao <taocm123@qq.com> Co-authored-by: chenchris2 <1349418798@qq.com> Co-authored-by: ChenxiQ <chenxi.qian.cq@outlook.com> Co-authored-by: coder-fny <985619145@qq.com> Co-authored-by: fuzhihong699 <fuzhihong4@huawei.com> Co-authored-by: GDzhu01 <809721801@qq.com> Co-authored-by: goldVitaminC <297780618@qq.com> Co-authored-by: HiC4Sh1e <chenjie137@huawei.com> Co-authored-by: hwhaokun <haokun0405@163.com> Co-authored-by: kirliavc <jlc@pku.edu.cn> Co-authored-by: lcfenglinwan <lcfenglin@qq.com> Co-authored-by: Liexss <924834690@qq.com> Co-authored-by: linfeng-yuan <1102311262@qq.com> Co-authored-by: liuyan190974 <shandaliuyan@163.com> Co-authored-by: LookAround0301 <lixushi@huawei.com> Co-authored-by: maoxx241 <maomaoyu870@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: monologue815 <monologue815@qq.com> Co-authored-by: MosCloud <bwzhang1991@163.com> Co-authored-by: nomewang <nomeyue@outlook.com> Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com> Co-authored-by: pinfa <1819563383@qq.com> Co-authored-by: pjgao <1783198484@qq.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> Co-authored-by: realliujiaxu <realliujiaxu@163.com> Co-authored-by: showMeYourCode1997 <934005226@qq.com> Co-authored-by: SidaoY <1024863041@qq.com> Co-authored-by: slippersss <slippersss@126.com> Co-authored-by: Toneymiller <1476209578@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com> Co-authored-by: weinachuan <1173732899@qq.com> Co-authored-by: WithHades <244036962@qq.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Co-authored-by: WOE-Y <876362620@qq.com> Co-authored-by: wxh571001500 <571001500@qq.com> Co-authored-by: wxsIcey <1790571317@qq.com> Co-authored-by: xmpp777 <yangming2@huawei.com> Co-authored-by: yiz-liu <liu_yizhou@outlook.com> Co-authored-by: yzylxyypl <yzylxyypl@gmail.com> Co-authored-by: zcc-zjut <zcczxy2019@163.com> Co-authored-by: zhangsicheng5 <zhangsicheng5@huawei.com> Co-authored-by: zhaozx-cn <zhaozx2116@163.com> Co-authored-by: zhenwenqi_2024 <zhenwenqi_2022@qq.com> Co-authored-by: ZT-AIA <1028681969@qq.com> Signed-off-by: maoxx241 <maomaoyu870@gmail.com>11 天前
[Build] Add support for Ascend950 chip (#7151) ### What this PR does / why we need it? This PR adds support for the Ascend950 chip. This includes: - Updating build scripts (CMakeLists.txt and setup.py) to recognize the Ascend950 chip and set appropriate compilation flags. - Disabling a set of custom operators that are not yet supported on the Ascend950 hardware target. - Performing a codebase-wide refactoring of pipe_barrier() calls to the namespaced AscendC::PipeBarrier<>() for improved code consistency and adherence to the latest API standards. Ascend950DT e2e passed (Qwen3-32B-MXFP8) and CI passed - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/4034c3d32e30d01639459edd3ab486f56993876d --------- Signed-off-by: linfeng-yuan <1102311262@qq.com>2 个月前
[BugFix]moe w4a8 ub fix and swiglu limit fix (#9259) ### What this PR does / why we need it? This PR fixes UB overflow issue and incorrect limit constraint problem in SwiGLU kernel on Ascend platform, which cause unstable computation and wrong inference results under MoE/Decode scenarios. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Verified on Ascend NPU with MoE and Decode workloads, inference correctness is ensured, no performance regression, existing CI passed. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: justice-dance <justice1717@163.com>8 天前
[feat] parameterize hardcoded MLA dimensions to support GLM5-W8A8 (#6902) Derive MLA dimension constants (q_lora_rank, qk_nope_head_dim, etc.) from tensor shapes at runtime instead of hardcoding DeepSeek V3 values. This enables the mla_preprocess fused op to work with both DeepSeek V3 and GLM5 models without Python API changes. - Add 9 dimension fields to MlaTilingData with DeepSeek V3 defaults - Add OpParam fields and dynamize all host-side tiling functions - Derive dimensions from wuk, gamma1, kv_cache_rope tensor shapes - Replace 310+ hardcoded constants across 4 kernel .hpp files - Remove unused MMSIZE1/MMSIZE2 constants ### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/15d76f74e2fdb12a95ea00f0ca283acf6219a2b7 --------- Signed-off-by: liuchenbing <chenliumail@163.com> Co-authored-by: liuchenbing <chenliumail@163.com>2 个月前
[Feature][Ops] Add A5 custom operator build support (#9271) ### What this PR does / why we need it? This PR adds the A5 / ascend950 custom operator build and binding support needed by the DeepSeek V4 A5 path. Changes included: - Updates the ascend950 CUSTOM_OPS_ARRAY in csrc/build_aclnn.sh to match the A5 ACLNN branch operator list. - Adds A5-specific custom operator directories for indexer_compress_epilog, indexer_compress_epilog_v2, kv_compress_epilog, kv_quant_sparse_attn_sharedkv, kv_quant_sparse_attn_sharedkv_metadata, load_index_kv_cache, hc_pre, and swiglu_group_quant. - Wires torch and meta registrations for the new A5 operators, including npu_hc_pre_v2 and grouped_matmul_swiglu_quant_weight_nz. - Skips the direct vllm_ascend_kernels target for ascend950 and uses VLLM_ENABLE_ATB_AND_DIRECT_KERNELS to guard direct-kernel includes, schemas, implementations, and meta registrations. This keeps ascend950 and 310P import-safe when those direct kernels are not built. - Maps ascend950 to the newer Ascend950PR_9599 CANN platform name used by the new build framework. - Updates quant_lightning_indexer for A5 PA cache views: it keeps the existing stride / scale_stride schema, avoids forcing 950 key/key-scale tensors contiguous, and uses the passed strides in the arch35 PA offset path. A3 conflict-sensitive note: - The A5 source branch also contains changes under shared A3 operator paths such as compressor, sparse_attn_sharedkv, hc_pre_sinkhorn, hc_pre_inv_rms, hc_post, inplace_partial_rotary_mul, moe_gating_top_k_hash, and fused-MoE Python code. - This PR intentionally does not overwrite those shared implementations. For shared operator names that remain in the ascend950 build list, it uses the current main-branch implementation and only adds the A5-specific operator directories missing from main. - quant_lightning_indexer is the intentional shared-path exception because A5 needs dim0-stride-aware PA cache access. - quant_lightning_indexer_metadata stays on the mainline Ascend950 SoC-version behavior and does not add legacy Ascend910_95 compatibility. - compressor is intentionally not changed in this PR and should be handled by the operator owners separately. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: maoxx241 <maomaoyu870@gmail.com>10 天前
[Ops][BugFix] Reuse common tiling_base for custom ops (#9103) ## Summary - remove duplicated host-side tiling_base and error_log headers from imported custom ops - include the common tiling_base headers directly from affected host tiling files - keep CeilDiv/CeilAlign in the common host tiling utility and expose them through the common error_log compatibility header - keep kernel-side local error_log headers untouched - include csrc changes in the 310P light-test tracker and map ascend310p builds to arch22 - update CANN 9.0 Ascend950 SOC naming from ascend910_95/ASCEND910_95 to ascend950/ASCEND950 ## Validation - git diff --check - git diff --cached --check - bash -n csrc/build.sh - bash -n csrc/build_aclnn.sh - python -m py_compile csrc/cmake/scripts/util/const_var.py csrc/cmake/scripts/util/opdesc_parser.py csrc/scripts/util/const_var.py - rg -n "ASCEND910_95|ascend910_95|Ascend910_9599" . Fixes the custom-op build issues seen in PR #9066 CI logs and avoids per-operator tiling_base copies. - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>16 天前
[Performance] add op chunk_fwd_o and chunk_gated_delta_rule_fwd_h (#9018) ### What this PR does / why we need it? add custom op for performance improve: chunk_fwd_o & chunk_gated_delta_rule_fwd_h - vLLM version: v0.19.1 - vLLM main: https://github.com/vllm-project/vllm/commit/4d51588e2381018348f1022dfa3a7698899805b7 Signed-off-by: AlanisZomeg <1308342839@qq.com>17 天前
[Feature] Update custom op build framework (#8146) ## Summary - update the custom-op build and packaging framework - align current custom-op integration under csrc while keeping the current main-branch operator implementations - improve custom-op runtime environment bootstrap so single-op tests and offline service no longer depend on manually sourcing vendor env scripts - keep the non-custom-op build path unchanged ## Notes - this PR focuses on the custom-op build framework update and related runtime loading path changes - operator implementations continue to follow the current main-branch codebase - final validation for this PR state relies on the CI results on GitHub ## Validation - remote A2 targeted verification completed for custom-op build/install and runtime bootstrap - full CI is used as the final validation gate for this PR state - vLLM version: - vLLM main: https://github.com/vllm-project/vllm/commit/v0.19.0 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>18 天前
[Ops][BugFix] Reuse common tiling_base for custom ops (#9103) ## Summary - remove duplicated host-side tiling_base and error_log headers from imported custom ops - include the common tiling_base headers directly from affected host tiling files - keep CeilDiv/CeilAlign in the common host tiling utility and expose them through the common error_log compatibility header - keep kernel-side local error_log headers untouched - include csrc changes in the 310P light-test tracker and map ascend310p builds to arch22 - update CANN 9.0 Ascend950 SOC naming from ascend910_95/ASCEND910_95 to ascend950/ASCEND950 ## Validation - git diff --check - git diff --cached --check - bash -n csrc/build.sh - bash -n csrc/build_aclnn.sh - python -m py_compile csrc/cmake/scripts/util/const_var.py csrc/cmake/scripts/util/opdesc_parser.py csrc/scripts/util/const_var.py - rg -n "ASCEND910_95|ascend910_95|Ascend910_9599" . Fixes the custom-op build issues seen in PR #9066 CI logs and avoids per-operator tiling_base copies. - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>16 天前
[Ops][BugFix] Reuse common tiling_base for custom ops (#9103) ## Summary - remove duplicated host-side tiling_base and error_log headers from imported custom ops - include the common tiling_base headers directly from affected host tiling files - keep CeilDiv/CeilAlign in the common host tiling utility and expose them through the common error_log compatibility header - keep kernel-side local error_log headers untouched - include csrc changes in the 310P light-test tracker and map ascend310p builds to arch22 - update CANN 9.0 Ascend950 SOC naming from ascend910_95/ASCEND910_95 to ascend950/ASCEND950 ## Validation - git diff --check - git diff --cached --check - bash -n csrc/build.sh - bash -n csrc/build_aclnn.sh - python -m py_compile csrc/cmake/scripts/util/const_var.py csrc/cmake/scripts/util/opdesc_parser.py csrc/scripts/util/const_var.py - rg -n "ASCEND910_95|ascend910_95|Ascend910_9599" . Fixes the custom-op build issues seen in PR #9066 CI logs and avoids per-operator tiling_base copies. - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>16 天前
[Ops][BugFix] Register DeepSeek V4 custom ops for 910B (#9339) ### What this PR does / why we need it? PR #9228 added the DeepSeek V4 custom operators and registered them for 910C/950 builds, but the 910B CUSTOM_OPS_ARRAY was not updated. This PR registers the same newly added operators in the 910B ACLNN build list so 910B builds include the DeepSeek V4 custom ops. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>9 天前
Quality enhancement: Immediately interrupt execution when memory OOM (#3932) ### What this PR does / why we need it? Protect the scene where the first problem occurs. The execution should be interrupted when the video memory application fails, rather than waiting until an illegal address is accessed. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.11.0 - vLLM main: https://github.com/vllm-project/vllm/commit/83f478bb19489b41e9d208b47b4bb5a95ac171ac Signed-off-by: leo-pony <nengjunma@outlook.com>6 个月前
[Misc] remove weak_ref_tensor C code (#8726) we use torch_npu._C._weak_ref_tensor instead of custom op, the weak_ref_tensor custom op is useless now. - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>1 个月前
[Feature][Model] Switch DeepSeekV4 hc_pre to fused op (#9396) ### What this PR does / why we need it? This PR switches DeepSeekV4 hc_pre from the composite small-op path to the aclnnHcPre-backed npu_hc_pre_v2 interface. It also aligns the runtime hc_pre torch binding contract with the CANN recipe binding while leaving the Meta implementation as shape inference only, so torch compile does not trip over runtime-only checks: - x must be 3D or 4D BF16 - hc_mult / hc must be 4 - d must be 4096 or 7168 - hc_fn must be [24, hc * d] - hc_scale must be [3] - hc_base must be [24] - non-x tensors must be FP32 For Ascend950, npu_hc_pre_v2 follows the CANN recipe's batch filter and falls back to the composite path when bs > 512 and bs is not aligned to 8192. Other SoCs continue to use the fused path. Reference: https://gitcode.com/cann/cann-recipes-infer/blob/master/ops/ascendc/torch_ops_extension/custom_ops/csrc/npu_hc_pre.cpp - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>8 天前
[Feature]Replace Triton-based conv1d update operator with AscendC implementation (#8842) ### What this PR does / why we need it? Replace Triton-based conv1d update operator with AscendC implementation.Fix Triton operator recompilation in certain scenarios. ### Does this PR introduce _any_ user-facing change? No _any_ user-facing change,just replace Triton-based conv1d update operator with AscendC implementation ### How was this patch tested? No scenario-specific checks required; verify once vLLM service starts. - vLLM version: v0.19.1 - vLLM main: https://github.com/vllm-project/vllm/commit/d886c26d4d4fef7d079696beb4ece1cfb4b008a8 --------- Signed-off-by: ZhuQi-seu <zhuqi12@huawei.com>9 天前
[Kernel] Add moe normal ops (#4810) ### What this PR does / why we need it? 1.Add the implementation of normal Aclnn operators: MoeCombineNormal, MoeDispatchNormal, NotifyDispatch,and DispatchLayout. - MoeCombineNormal: Implements the combine logic within MoE operations. - MoeDispatchNormal: Implements the dispatch logic within MoE operations. - NotifyDispatch: Exchanges topk_idx information among different ranks to calculate the device memory required for the dispatch stage. - DispatchLayout: Used to calculate information related to the device memory layout for the dispatch stage. 2.Provide PyTorch interfaces for normal operators—get_dispatch_layout, dispatch_prefill, and combine_prefill—to be used for MoE communication during the prefill stage in vLLM. - get_dispatch_layout: Calculates information related to the device memory layout for the dispatch operator, and is called before dispatch_prefill. - dispatch_prefill: Initiates the dispatch operation. - combine_prefill: Initiates the combine operation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The functionality has already been validated using the local Qwen model. Test cases will be added after support for multi-NPU use cases in the CI pipeline is finalized. - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: shiro-zzzz <zhangdianhao@huawei.com>5 个月前
[Misc][Upgrade] Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 (#9085) Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>16 天前