文件最后提交记录最后更新时间
[CI] Remove hardcoded CANN version from wheel build config (#8542) ### What this PR does / why we need it? Each release branch targets a specific vLLM version, which corresponds to a different CANN version. Hardcoding cann_version: 8.5.1 in the wheel build config means all branches share the same fixed CANN version regardless of their actual requirements. By removing this field, the CANN version is determined by the CI environment at build time, allowing each branch to naturally use the correct CANN version. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 Signed-off-by: hfadzxy <starmoon_zhang@163.com>1 个月前
[CI][Feature] Add a automated main2main bisect tool (#8084) ### What this PR does / why we need it? This PR introduces an automated git bisect tool to quickly identify breaking commits in the upstream vLLM repository that cause regressions invllm-ascend. - tools/bisect_vllm.sh: Manages the git bisect lifecycle and test execution. - tools/bisect_helper.py: Automates environment detection and generates Markdown reports. - .github/workflows/bisect_vllm.yaml: Integrates the bisect process into CI. - test CI: https://github.com/nv-action/vllm-benchmarks/actions/runs/24133478538 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: - vLLM main: https://github.com/vllm-project/vllm/commit/v0.19.0 --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>1 个月前
[Bugfix][CI] Optimize the cleanup mechanism of RemoteOpenAIServer (#9356) ### What this PR does / why we need it? - Extract the existing RemoteEPDServer process-tree cleanup logic into a shared _terminate_process_tree() helper. - Reuse the helper in both RemoteOpenAIServer and RemoteEPDServer. - Return standard exit code 1 for failed suites instead of -1, avoiding shell-side 255 exit codes. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: MrZ20 <2609716663@qq.com>8 天前
[CI] Refactor light test cases and update test coverage (#9059) ### What this PR does / why we need it? Main changes: - Adds dedicated light E2E test files for: - singlecard basic light coverage, including Qwen3 dense, embedding, VLM, and W8A8 Eagle3 cases - 2-card light coverage, including Qwen3 MoE TP2/EP and Qwen3-VL PP2 multimodal cases - 4-card light coverage, including DeepSeek W8A8 TP/PP/EP/EPLB and PD disaggregation cases - Adds RemotePDServer and DisaggPDProxy helpers for ordinary PD disaggregation E2E tests. - Supports launching multiple vLLM serve processes in one test. - Assigns ASCEND_RT_VISIBLE_DEVICES based on each server's tp * dp requirement. - Launches the disaggregated prefill proxy and validates requests through the proxy endpoint. - Updates E2E CI config to run the new light suites. - Replaces old light suite entries with the new test_light.py cases. - Adds a 4-card light CI job for the new 4-card light coverage. - Increase patch_qwen3_vl_moe_pp_layer_range until the commit of the vllm code includes: - https://github.com/vllm-project/vllm/commit/cee6751e548357478a9943cae5786062b7b95127 | Feature | Qwen3<br>-0.6B | Qwen3<br>-8B-W8A8 | Qwen3-Embedding | Qwen3.5<br>-0.8B | Qwen3<br>-30B | Qwen3<br>-VL-30B | DeepSeek<br>-V3.2-W8A8-Pruning | DeepSeek<br>-V3.2-W8A8-Pruning | | -- | -- | -- | -- | -- | -- | -- | -- | -- | | Card count | 1 | 1 | 1 | 1 | 2 | 2 | 4 | 4 | | Dense | ✅ | ✅ |   |   |   |   |   |  | | Moe |   |   |   |   | ✅ |   | ✅ | ✅ | | Embedding |   |   | ✅ |   |   |   |   |  | | Mamba/SSM |   |   |   | ✅ |   |   |   |  | | Multimodal Reasoning |   |   |   | ✅ |   | ✅ |   |  | | TP |   |   |   |   | ✅ |   | ✅|  | | PP |   |   |   |   |   | ✅ | ✅ |  | | EP |   |   |   |   | ✅ |   | ✅ |  | | EPLB |   |   |   |   | ✅ |   | |  | | Full Graph |   | ✅ | | ✅ | ✅ | ✅ | ✅ | ✅ | | PIECEWISE Graph | ✅ |   | ✅ |   |   |   |   |  | | PD disaggregation |   |   |   |   |   |   |   | ✅ | | W8A8 |   | ✅ |   |   |   |   | ✅ | ✅ | | MTP |   |   |   | ✅ |   |   |   |  | | Eagle-3 |   | ✅ |   |   |   |   |   |  | | SFA/DSA |   |   |   |   |   |   |   | ✅ | ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated E2E-Light cases with A2/A3 passed. - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 --------- Signed-off-by: MrZ20 <2609716663@qq.com>11 天前
[CI][1/N] Add hardware-awared test gating and CI routing (#8557) ## What this PR do? This PR introduces an npu_test decorator for hardware-based test gating and CI routing, alongside new unit tests for NPUModelRunner. Feedback highlights the need for more robust torch_npu mocking to prevent errors in non-NPU environments and suggests that the @npu_test decorator should verify the specific NPU chip type at runtime. We've long been plagued by excessively long end-to-end testing durations, which is extremely detrimental to community health and developer well-being. We aim to implement selective testing based on developer pull requests (PRs): This will be addressed from two main perspectives: 1. Intelligently selecting test cases to trigger based on developer modifications. This is easily achievable; we simply need to ensure a one-to-one correspondence between the test directory (tests/ut) and the modules in the src directory. Furthermore, for some common, fundamental modules, we will allow them to be unconditionally tested in every PR (this is meaningful because certain modules...). 2. Stateful test cases that developers are aware of. Developers only need to add a @npu_test decorator to specify the required NPU device type and number of chips. The system will automatically route this test case to an appropriate node for testing. ## What's next? 1. Fill in more non-e2e cases that can run on the NPU to have some real verification. 2. Once the cases above are complete enough, we try to reduce the existing e2e cases until they can reach a healthy duration. **This is a massive undertaking, and anyone who is interested can get involved.** --------- Signed-off-by: wangli <wangli858794774@gmail.com>1 个月前
[Doc][Bugfix] Fix numbered list translation not displayed in Sphinx (#8629) ### What this PR does / why we need it? Add prompt rule to prevent spaces between list markers and Chinese text in translated msgstr (e.g., use "1.中文" instead of "1. 中文"), which caused Sphinx to ignore the translation and fall back to English. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 Signed-off-by: hfadzxy <starmoon_zhang@163.com>1 个月前
[Test] Add vllm cases (#8458) ### What this PR does / why we need it? Enable more vLLM native test on Ascend to enhance the quality of vLLM Aascend. These tests will be run weekly by CI /CD ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? no --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: guxin108 <1252896542@qq.com> Co-authored-by: wangli <wangli858794774@gmail.com>1 个月前
[Misc][Upgrade] Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 (#9085) Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>16 天前
[CI][1/N] Add hardware-awared test gating and CI routing (#8557) ## What this PR do? This PR introduces an npu_test decorator for hardware-based test gating and CI routing, alongside new unit tests for NPUModelRunner. Feedback highlights the need for more robust torch_npu mocking to prevent errors in non-NPU environments and suggests that the @npu_test decorator should verify the specific NPU chip type at runtime. We've long been plagued by excessively long end-to-end testing durations, which is extremely detrimental to community health and developer well-being. We aim to implement selective testing based on developer pull requests (PRs): This will be addressed from two main perspectives: 1. Intelligently selecting test cases to trigger based on developer modifications. This is easily achievable; we simply need to ensure a one-to-one correspondence between the test directory (tests/ut) and the modules in the src directory. Furthermore, for some common, fundamental modules, we will allow them to be unconditionally tested in every PR (this is meaningful because certain modules...). 2. Stateful test cases that developers are aware of. Developers only need to add a @npu_test decorator to specify the required NPU device type and number of chips. The system will automatically route this test case to an appropriate node for testing. ## What's next? 1. Fill in more non-e2e cases that can run on the NPU to have some real verification. 2. Once the cases above are complete enough, we try to reduce the existing e2e cases until they can reach a healthy duration. **This is a massive undertaking, and anyone who is interested can get involved.** --------- Signed-off-by: wangli <wangli858794774@gmail.com>1 个月前
[CI] Enable auto upgrade e2e estimated time for auto-partition suites (#6840) ### What this PR does / why we need it? This patch add a schedule triggered workflow for auto upgrade e2e estimated-time for batter load balance 1. The workflow will run the full e2e test to get the duration of each test. 2. The script update_estimated_time.py will upgrade the [config.json](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/scripts/config.yaml) according to the latest time 3. The workflow will submit a pull request that includes changes to config.json automatically <img width="2484" height="764" alt="image" src="https://github.com/user-attachments/assets/02f3459c-bb3b-4f8e-9966-8bb2e5c1bbea" /> ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd - ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd --------- Signed-off-by: wangli <wangli858794774@gmail.com>2 个月前
[Test] Add a case and fix upstream workflow (#9229) ### What this PR does / why we need it? This PR adds an upstream case and fix workflow, we need it for testing upstream cases ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the case - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>10 天前
[CI][1/N] Add hardware-awared test gating and CI routing (#8557) ## What this PR do? This PR introduces an npu_test decorator for hardware-based test gating and CI routing, alongside new unit tests for NPUModelRunner. Feedback highlights the need for more robust torch_npu mocking to prevent errors in non-NPU environments and suggests that the @npu_test decorator should verify the specific NPU chip type at runtime. We've long been plagued by excessively long end-to-end testing durations, which is extremely detrimental to community health and developer well-being. We aim to implement selective testing based on developer pull requests (PRs): This will be addressed from two main perspectives: 1. Intelligently selecting test cases to trigger based on developer modifications. This is easily achievable; we simply need to ensure a one-to-one correspondence between the test directory (tests/ut) and the modules in the src directory. Furthermore, for some common, fundamental modules, we will allow them to be unconditionally tested in every PR (this is meaningful because certain modules...). 2. Stateful test cases that developers are aware of. Developers only need to add a @npu_test decorator to specify the required NPU device type and number of chips. The system will automatically route this test case to an appropriate node for testing. ## What's next? 1. Fill in more non-e2e cases that can run on the NPU to have some real verification. 2. Once the cases above are complete enough, we try to reduce the existing e2e cases until they can reach a healthy duration. **This is a massive undertaking, and anyone who is interested can get involved.** --------- Signed-off-by: wangli <wangli858794774@gmail.com>1 个月前
[CI][BugFix] Remove e2e tracker guard on smart e2e (#8777) ### What this PR does / why we need it? This patch fixed an edge case where Smart e2e wasn't triggered as expected when only tests were added to the tests/ut directory. Ideally, Smart E2E shouldn't use E2E triggering conditions as its entry point. Furthermore, the specific routing logic should be left to Smart E2E to determine itself. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.1 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 --------- Signed-off-by: wangli <wangli858794774@gmail.com>1 个月前