文件最后提交记录最后更新时间
[BugFix]moe w4a8 ub fix and swiglu limit fix (#9259) ### What this PR does / why we need it? This PR fixes UB overflow issue and incorrect limit constraint problem in SwiGLU kernel on Ascend platform, which cause unstable computation and wrong inference results under MoE/Decode scenarios. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Verified on Ascend NPU with MoE and Decode workloads, inference correctness is ensured, no performance regression, existing CI passed. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: justice-dance <justice1717@163.com>9 天前
[Feature] Update custom op build framework (#8146) ## Summary - update the custom-op build and packaging framework - align current custom-op integration under csrc while keeping the current main-branch operator implementations - improve custom-op runtime environment bootstrap so single-op tests and offline service no longer depend on manually sourcing vendor env scripts - keep the non-custom-op build path unchanged ## Notes - this PR focuses on the custom-op build framework update and related runtime loading path changes - operator implementations continue to follow the current main-branch codebase - final validation for this PR state relies on the CI results on GitHub ## Validation - remote A2 targeted verification completed for custom-op build/install and runtime bootstrap - full CI is used as the final validation gate for this PR state - vLLM version: - vLLM main: https://github.com/vllm-project/vllm/commit/v0.19.0 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>18 天前
[BugFix]moe w4a8 ub fix and swiglu limit fix (#9259) ### What this PR does / why we need it? This PR fixes UB overflow issue and incorrect limit constraint problem in SwiGLU kernel on Ascend platform, which cause unstable computation and wrong inference results under MoE/Decode scenarios. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Verified on Ascend NPU with MoE and Decode workloads, inference correctness is ensured, no performance regression, existing CI passed. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: justice-dance <justice1717@163.com>9 天前
[Feature] Update custom op build framework (#8146) ## Summary - update the custom-op build and packaging framework - align current custom-op integration under csrc while keeping the current main-branch operator implementations - improve custom-op runtime environment bootstrap so single-op tests and offline service no longer depend on manually sourcing vendor env scripts - keep the non-custom-op build path unchanged ## Notes - this PR focuses on the custom-op build framework update and related runtime loading path changes - operator implementations continue to follow the current main-branch codebase - final validation for this PR state relies on the CI results on GitHub ## Validation - remote A2 targeted verification completed for custom-op build/install and runtime bootstrap - full CI is used as the final validation gate for this PR state - vLLM version: - vLLM main: https://github.com/vllm-project/vllm/commit/v0.19.0 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>18 天前
[BugFix]moe w4a8 ub fix and swiglu limit fix (#9259) ### What this PR does / why we need it? This PR fixes UB overflow issue and incorrect limit constraint problem in SwiGLU kernel on Ascend platform, which cause unstable computation and wrong inference results under MoE/Decode scenarios. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Verified on Ascend NPU with MoE and Decode workloads, inference correctness is ensured, no performance regression, existing CI passed. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: justice-dance <justice1717@163.com>9 天前
[Feature] Update custom op build framework (#8146) ## Summary - update the custom-op build and packaging framework - align current custom-op integration under csrc while keeping the current main-branch operator implementations - improve custom-op runtime environment bootstrap so single-op tests and offline service no longer depend on manually sourcing vendor env scripts - keep the non-custom-op build path unchanged ## Notes - this PR focuses on the custom-op build framework update and related runtime loading path changes - operator implementations continue to follow the current main-branch codebase - final validation for this PR state relies on the CI results on GitHub ## Validation - remote A2 targeted verification completed for custom-op build/install and runtime bootstrap - full CI is used as the final validation gate for this PR state - vLLM version: - vLLM main: https://github.com/vllm-project/vllm/commit/v0.19.0 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>18 天前
[Feature] Update custom op build framework (#8146) ## Summary - update the custom-op build and packaging framework - align current custom-op integration under csrc while keeping the current main-branch operator implementations - improve custom-op runtime environment bootstrap so single-op tests and offline service no longer depend on manually sourcing vendor env scripts - keep the non-custom-op build path unchanged ## Notes - this PR focuses on the custom-op build framework update and related runtime loading path changes - operator implementations continue to follow the current main-branch codebase - final validation for this PR state relies on the CI results on GitHub ## Validation - remote A2 targeted verification completed for custom-op build/install and runtime bootstrap - full CI is used as the final validation gate for this PR state - vLLM version: - vLLM main: https://github.com/vllm-project/vllm/commit/v0.19.0 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>18 天前