文件最后提交记录最后更新时间
[CI] replace mirror with CDN (#9345) ### What this PR does / why we need it? Replace CDN mirror repo. ### Does this PR introduce _any_ user-facing change? uses https://repo.huaweicloud.com/ascend/repos/pypi ### How was this patch tested? Accelerate package download - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: tfhddd <2272751277@qq.com>9 天前
[Lint]Style: reformat markdown files via markdownlint (#5884) ### What this PR does / why we need it? reformat markdown files via markdownlint - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df --------- Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>4 个月前
[CI] add weekly case (#9380) ### What this PR does / why we need it? We run the weekly test case at a fixed period. add weekly case ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by running the test - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: chen-commits <1636718796@qq.com> Signed-off-by: chen <1636718796@qq.com>8 天前
[Bugfix][CI] Optimize the cleanup mechanism of RemoteOpenAIServer (#9356) ### What this PR does / why we need it? - Extract the existing RemoteEPDServer process-tree cleanup logic into a shared _terminate_process_tree() helper. - Reuse the helper in both RemoteOpenAIServer and RemoteEPDServer. - Return standard exit code 1 for failed suites instead of -1, avoiding shell-side 255 exit codes. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: MrZ20 <2609716663@qq.com>8 天前
[BugFix] Modify Failed vLLM Test Cases (#7946) ### What this PR does / why we need it? correct the failure cases ### Does this PR introduce _any_ user-facing change? modify the upstream_config.yaml and pr-test-upstream.yaml ### How was this patch tested? no - vLLM version: v0.18.0 - vLLM main: https://github.com/vllm-project/vllm/commit/35141a7eeda941a60ad5a4956670c60fd5a77029 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: guxin108 <1252896542@qq.com> Co-authored-by: wangli <wangli858794774@gmail.com>1 个月前
[CI] Move part of nightly test to weekly (#9210) ### What this PR does / why we need it? Move part of nightly test to weekly ```shell tests/e2e/nightly/multi_node/config/GLM5_1-W8A8-A3-dual-nodes.yaml tests/e2e/nightly/single_node/models/configs/DeepSeek-V3.2-W8A8.yaml tests/e2e/nightly/single_node/models/configs/Qwen3.5-397B-A17B-W8A8-mtp-A3.yaml tests/e2e/nightly/single_node/models/configs/Kimi-K2.5.yaml tests/e2e/nightly/single_node/models/configs/Qwen3.5-122B-A10B-W8A8-A3.yaml tests/e2e/nightly/single_node/models/configs/Qwen3.5-27B-w8a8-A3.yaml tests/e2e/nightly/single_node/models/configs/MiniMax-M2.5-w8a8-QuaRot-A3.yaml ``` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/ce29c26b31d432b1b4bc028c46bb2c3b07a667d8 --------- Signed-off-by: wangli <wangli858794774@gmail.com>11 天前
[CI] replace mirror with CDN (#9345) ### What this PR does / why we need it? Replace CDN mirror repo. ### Does this PR introduce _any_ user-facing change? uses https://repo.huaweicloud.com/ascend/repos/pypi ### How was this patch tested? Accelerate package download - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: tfhddd <2272751277@qq.com>9 天前
[CI] replace mirror with CDN (#9345) ### What this PR does / why we need it? Replace CDN mirror repo. ### Does this PR introduce _any_ user-facing change? uses https://repo.huaweicloud.com/ascend/repos/pypi ### How was this patch tested? Accelerate package download - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: tfhddd <2272751277@qq.com>9 天前
[CI] set MAX_JOBS for vllm-ascend install in workflows (#9407) ### What this PR does / why we need it? - Add MAX_JOBS to the step "Install vllm-project/vllm-ascend" in the workflow to control parallel compilation capacity and prevent OOM during compilation. - Calculation rule for MAX_JOBS: Number of CPU cores × Number of NPU cards ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: MrZ20 <2609716663@qq.com>8 天前
[Misc][Upgrade] Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 (#9085) Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>16 天前
[CI] set MAX_JOBS for vllm-ascend install in workflows (#9407) ### What this PR does / why we need it? - Add MAX_JOBS to the step "Install vllm-project/vllm-ascend" in the workflow to control parallel compilation capacity and prevent OOM during compilation. - Calculation rule for MAX_JOBS: Number of CPU cores × Number of NPU cards ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: MrZ20 <2609716663@qq.com>8 天前
[CI] Support artifact collection and dashboard reporting for nightly tests (#7102) ### What this PR does / why we need it? This PR improves the nightly CI test infrastructure in several ways: 1. **Artifact support for multi-node nightly tests**: Benchmark results are now saved to a PVC-mounted directory (LOG_PREFIX/benchmark_results) for persistence after pod termination, and uploaded as GitHub Actions artifacts for easy retrieval. 2. **Artifact support for single-node nightly tests**: Similarly adds artifact upload steps to collect and merge test results from single-node nightly runs on A2/A3 hardware. 3. **vLLM Ascend dashboard reporting**: The single-node test script now generates and uploads structured benchmark data (throughput, latency, etc.) to support the vllm-ascend model performance dashboard. ```json { "model_name": "Qwen/Qwen3-32B", "hardware": "A2", "dtype": "bf16", "feature": [ "weight_nz_layout" ], "vllm_version": "0.18.0+empty", "vllm_ascend_version": "f4fafc62a63f2a761952ee8123fd4b8fb48729e3", "tasks": [ { "name": "gsm8k-lite", "metrics": { "accuracy": 100.0 }, "test_input": { "max_out_len": 32768, "batch_size": 32 }, "target": { "baseline": 95, "threshold": 5 }, "pass_fail": "pass" }, { "name": "GSM8K-in3500-bs400", "metrics": { "Benchmark_Duration(BD)": 338621.3952, "Prefill_Token_Throughput(PTT)": 608.5932, "Input_Token_Throughput(ITT)": 864.5437, "Output_Token_Throughput(OTT)": 354.3781, "Total_Token_Throughput(TTT)": 1218.9218 }, "test_input": { "num_prompts": 80, "max_out_len": 1500, "batch_size": 20, "request_rate": 0 }, "target": { "baseline": 1, "threshold": 0.97 }, "pass_fail": "pass" } ], "serve_cmd": { "mix": "vllm serve Qwen/Qwen3-32B --no-enable-prefix-caching --tensor-parallel-size 4 --port 45861 --max-model-len 36864 --max-num-batched-tokens 36864 --block-size 128 --trust-remote-code --gpu-memory-utilization 0.9 --additional-config {\"enable_weight_nz_layout\":true}" }, "environment": { "TASK_QUEUE_ENABLE": "1", "OMP_PROC_BIND": "false", "HCCL_OP_EXPANSION_MODE": "AIV", "PAGED_ATTENTION_MASK_LEN": "5500" }, "pass_fail": "pass" } ``` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/4034c3d32e30d01639459edd3ab486f56993876d --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>1 个月前
[CI] replace mirror with CDN (#9345) ### What this PR does / why we need it? Replace CDN mirror repo. ### Does this PR introduce _any_ user-facing change? uses https://repo.huaweicloud.com/ascend/repos/pypi ### How was this patch tested? Accelerate package download - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: tfhddd <2272751277@qq.com>9 天前
[CI] Add csrc cache for image build (#9037) ### What this PR does / why we need it? This pull request introduces a csrc build artifact caching mechanism into image building, which is expected to reduce image build time from 50 minutes to 10 minutes. - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 --------- Signed-off-by: wangli <wangli858794774@gmail.com>18 天前
Issue auto set label optimize: remove match label from issue body (#7696) ### What this PR does / why we need it? 1. Issue auto set label optimize: remove match label from issue body 2. optimize auto-label match reguler for model:glm, wan, qwen-next,emu,eplb,deepseek 3. optimize label name structure: replace '_' to '-'. - vLLM version: v0.18.0 - vLLM main: https://github.com/vllm-project/vllm/commit/35141a7eeda941a60ad5a4956670c60fd5a77029 --------- Signed-off-by: leo-pony <nengjunma@outlook.com>2 个月前
[CI] Improve CI (#5078) Raname workflow to be clear. - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>5 个月前
[CI] Fix pr-create bot (#8329) Fix bot_pr_create CI job error, make it work again. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>1 个月前
[CI] Bump actions/download-artifact from 4 to 8 (#9050) Bumps [actions/download-artifact](https://github.com/actions/download-artifact)from 4 to 8. - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>9 天前
[Misc] Fix doc test (#8277) ### What this PR does / why we need it? This patch normalize the doc tests between nightly tests andPR tests and update it to the latest daily built images (main/v0.18.0). - vLLM version: - vLLM main: https://github.com/vllm-project/vllm/commit/v0.19.0 --------- Signed-off-by: wangli <wangli858794774@gmail.com>1 个月前
[BugFix] install datasets for download dataset (#9253) ### What this PR does / why we need it? install datasets for download dataset ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: hfadzxy <starmoon_zhang@163.com>9 天前
[CI] Optimize nightly test scheduling strategy (#8034) ### What this PR does / why we need it? This PR improves and reorganizes the nightly CI test workflows: - **Split multi-node tests**: Extract the 4-node DeepSeek-V3_2-W8A8-EP test into a separate multi-node-tests job (requiring 4 nodes), and group remaining 2-node tests into a new double-node-tests job. This avoids resource contention and allows more efficient parallel scheduling. - **Add job timeout**: Add timeout-minutes: 120 to multi-node jobs; reduce single-node job timeout from 600 to 120 minutes to prevent runaway jobs. - **Increase a2 parallelism**: Raise max-parallel from 2 to 5 for a2 single-node tests to speed up the nightly run. - **Add branch selection for nightly image build**: Add a vllm_ascend_branch input (choices: main, releases/v0.18.0) to the workflow_dispatch trigger of nightly_image_build.yaml, enabling manual builds against specific branches. - **Fix clear-pre-logs dependency**: Update the clear-pre-logs job to wait on both multi-node-tests and double-node-tests. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.18.0 - vLLM main: https://github.com/vllm-project/vllm/commit/14acf429ac08b6d538ca6feb3e06b6d13895804d Signed-off-by: hfadzxy <starmoon_zhang@163.com>1 个月前
Bump actions/github-script from 7 to 8 (#5796) Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 8. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/2f4e6548efec402b913ffddc8726230d9311948d Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>4 个月前
[CI]Main2main 0515 (#9176) ### What this PR does / why we need it? Upstream PR [vllm-project/vllm#39568](https://github.com/vllm-project/vllm/pull/39568) is a complete rewrite of the routed-experts capture/transport pipeline. It supersedes both: - The original 0.20.2 design — RoutedExpertsCapturer.get_instance() singleton, save_captured_experts(indices=...), shared-memory + fcntl.flock cross-process transport. - The intermediate PR #39917 design — module-level get_global_experts_capturer(), init_routed_experts_capturer_with_shared_cache(), issue_routing_d2h_copy(), extract_routed_experts_for_current_batch(). This API existed in main for only a few days and was never in a stable release; it has been **fully removed**. After the upgrade to vLLM 0515, vllm-ascend faces two API surfaces that are incompatible at the source level: | Aspect | 0.20.2 | main | |---|---|---| | Capturer access | RoutedExpertsCapturer.get_instance() (singleton) | runner.routed_experts_capturer (per-runner instance, no global) | | Per-step clear_buffer | via singleton | via runner attribute | | Per-step D2H + ship | capturer.save_captured_experts(indices=cpu_slot_mapping) (sync, shm write) | runner-managed pinned routed_experts_cpu D2H + RoutedExpertsLists on ModelRunnerOutput.routed_experts | | Output channel | shm/flock to scheduler | ModelRunnerOutput.routed_experts: RoutedExpertsLists (NamedTuple, msgpack + zmq IPC) | | slot_mapping source | slot_mapping.cpu().numpy() saved to self.cpu_slot_mapping | private device snapshot routed_experts_slot_mapping_device, then pinned routed_experts_slot_mapping_cpu | | Layer hook injection | select_experts calls singleton from inside apply() | module.router.set_capture_fn(...) from _bind_routed_experts_capturer | ## Strategy Overview 1. **Keep the 0.20.2 path intact.** It already works end-to-end. All 0.20.2-specific call sites stay byte-identical. 2. **Add a parallel main path** gated by `vllm_version_is("0.20.2") == False. Reuse upstream GPUModelRunner.init_routed_experts_capturer()` (inherited) for buffer allocation; override only _bind_routed_experts_capturer because Ascend's select_experts does not go through upstream BaseRouter. 3. **Async scheduling: piggyback on upstream AsyncGPUModelRunnerOutput.** vllm-ascend already constructs that wrapper directly, so adding the routed_experts= kwarg is enough — the wrapper handles to_cpu_nonblocking() on its copy stream and tolists() finalization in get_output() for free. 4. **No new compat module, no monkey patches.** Branching is inline at each call site; total surface is one new method (_bind_routed_experts_capturer) plus three branched call sites in model_runner_v1.py and one in fused_moe.py. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/ce29c26b31d432b1b4bc028c46bb2c3b07a667d8 --------- Signed-off-by: wangli <wangli858794774@gmail.com>12 天前
[BugFix][310p] Fix torch-npu cannot import error (#9249) ### What this PR does / why we need it? Fixed the recent CI failure on Ascend 310P where torch_npu could not be imported. The root cause is related to the torch-npu 2.10.0 upgrade. After the upgrade, if a residual triton directory still exists in the environment, importing torch_npu may indirectly depend on triton.language. However, Triton is not supported on Ascend 310P and should be removed. In the CI environment, triton had been uninstalled, but the cleanup was incomplete because of the uninstall order. We need to uninstall triton-ascend first and then uninstall triton; otherwise, some Triton-related files may remain. The correct cleanup order is: ```bash pip uninstall -y triton-ascend pip uninstall -y triton ``` ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? CI - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: Tflowers-0129 <2906339855@qq.com>11 天前
[CI] replace mirror with CDN (#9345) ### What this PR does / why we need it? Replace CDN mirror repo. ### Does this PR introduce _any_ user-facing change? uses https://repo.huaweicloud.com/ascend/repos/pypi ### How was this patch tested? Accelerate package download - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: tfhddd <2272751277@qq.com>9 天前
[Misc][Upgrade] Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 (#9085) Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>16 天前
[CI] Bump actions/setup-python from 5 to 6 (#9051) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>9 天前
[CI] Remove releases/v0.18.0 branch from nightly test workflows (#8874) ### What this PR does / why we need it? Remove releases/v0.18.0 branch from nightly test workflows ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.1 - vLLM main: https://github.com/vllm-project/vllm/commit/d886c26d4d4fef7d079696beb4ece1cfb4b008a8 Signed-off-by: hfadzxy <starmoon_zhang@163.com>23 天前
[CI] Main2main 0514 (#9155) ### What this PR does / why we need it? 1. fix https://github.com/vllm-project/vllm/issues/33322 overwrite gpu_modelrunner.sync_and_gather_intermediate_tensors, for the sceniro pp+sp+tp, skip scatter the residual for ascend 2. https://github.com/vllm-project/vllm/issues/35520 Adapted to the modifications of ModelRunner v2 for hybrid attn in interface level, . Todo: Added support for Mamba in ModelRunner in Ascend. any pull_request is welcome 3. https://github.com/vllm-project/vllm/issues/40711 4. https://github.com/vllm-project/vllm/pull/42121 5. https://github.com/vllm-project/vllm/pull/41706 6. https://github.com/vllm-project/vllm/issues/39917 Disable async_schedule when enable_return_routed_experts=True 7. https://github.com/vllm-project/vllm/pull/41046 8. https://github.com/vllm-project/vllm/pull/41055 9. https://github.com/vllm-project/vllm/pull/41035 10. https://github.com/vllm-project/vllm/pull/42434 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 --------- Signed-off-by: wangli <wangli858794774@gmail.com>15 天前
[CI] Fix nightly case:Qwen3.5-397B-w4a8-A3 (#9124) ### What this PR does / why we need it? we fix the failed nightly case:Qwen3.5-397B-w4a8-A3 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 --------- Signed-off-by: guxin108 <1252896542@qq.com>16 天前
[CI] add weekly case (#9380) ### What this PR does / why we need it? We run the weekly test case at a fixed period. add weekly case ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by running the test - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: chen-commits <1636718796@qq.com> Signed-off-by: chen <1636718796@qq.com>8 天前
[CI] Bump actions/download-artifact from 4 to 8 (#9050) Bumps [actions/download-artifact](https://github.com/actions/download-artifact)from 4 to 8. - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>9 天前
[CI] refine issue triage rules, wan regex and update stale setting (#7531) - Update issue labeler regex for wan to match numeric suffix only, including both standalone wan label and multi-modality-generate aggregate rule. - Add title-based gate conditions in issue triage workflow so auto-labeling runs only for expected issue templates ( [Bug]: , [Installation]: , [Usage]: , [Doc]: ). - Adjust scheduled stale workflow configuration for the awaiting-feedback processing block. ### What this PR does / why we need it? - Update issue labeler regex for wan to match numeric suffixes only, in both: - standalone wan label rule - multi-modality-generate aggregate rule - Add title-based gate conditions in issue triage workflow so auto-labeling runs only for expected templates: [Bug]:/ [Installation]:/ [Usage]:/ [Doc]: - Adjust the scheduled stale workflow configuration for the awaiting-feedback processing block. ### Does this PR introduce _any_ user-facing change? - No runtime/API user-facing change. - This PR only updates repository automation behavior in GitHub workflows and issue labeling rules. ### How was this patch tested? - Performed config-level validation by reviewing diffs and final YAML content for: - .github/issue-labeler.yml - .github/workflows/bot_issue_manage.yaml - .github/workflows/schedule_stale_manage.yaml - Verified wan regex now requires numeric suffix (e.g., wan2 , wan2.1 ) and no longer matches alphabetic suffix forms (e.g., wana ). - Verified triage workflow includes title-based if conditions for expected issue templates. - Verified stale workflow’s awaiting-feedback block reflects the intended configuration adjustment. - No unit/e2e tests were added because this PR changes GitHub Actions and labeling configuration only. - vLLM version: v0.18.0 - vLLM main: https://github.com/vllm-project/vllm/commit/8b6325758cce5f9c36d38f2462edbd368b97a07c --------- Signed-off-by: drizzlezyk <drizzlezyk@163.com>2 个月前
[Misc][Upgrade] Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 (#9085) Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>16 天前
[CI] Bump actions/download-artifact from 4 to 8 (#9050) Bumps [actions/download-artifact](https://github.com/actions/download-artifact)from 4 to 8. - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>9 天前
[CI] replace mirror with CDN (#9345) ### What this PR does / why we need it? Replace CDN mirror repo. ### Does this PR introduce _any_ user-facing change? uses https://repo.huaweicloud.com/ascend/repos/pypi ### How was this patch tested? Accelerate package download - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: tfhddd <2272751277@qq.com>9 天前
[CI] add weekly case (#9380) ### What this PR does / why we need it? We run the weekly test case at a fixed period. add weekly case ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by running the test - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: chen-commits <1636718796@qq.com> Signed-off-by: chen <1636718796@qq.com>8 天前
README.md

E2E Test Workflow Guide

This document provides a guide on how to manage and extend the E2E test suite for vllm-ascend. It covers how to add new test cases and understand the automatic partitioning mechanism.

1. Adding a New Test Case

All E2E test cases are defined and managed in the .github/workflows/scripts/config.yaml file.

Steps

  1. Prepare the Test Script: Ensure your test script (.py file) is placed in the appropriate location under the tests/e2e/ directory (e.g., tests/e2e/singlecard/ or tests/e2e/multicard/).

  2. Modify config.yaml: Open .github/workflows/scripts/config.yaml and locate the corresponding test suite (e.g., e2e-singlecard or e2e-multicard-2-cards).

  3. Add Configuration Entry: Add a new entry under the corresponding list. Each entry contains the following fields:

    • name: The relative path to the test file. If you only need to run a specific test function within the file, use :: as a separator, e.g., path/to/test.py::test_func.
    • estimated_time: The estimated time (in seconds) required to run the test. This field is crucial as it is used for automatic load balancing (partitioning).
    • is_skipped (Optional): If set to true, the test will be skipped.

Example

Suppose you want to add a new test named tests/e2e/singlecard/test_new_feature.py with an estimated runtime of 120 seconds:

suites:
  e2e-singlecard:
    # ... other existing tests ...
    - name: tests/e2e/singlecard/test_new_feature.py
      estimated_time: 120

To add a specific test function:

    - name: tests/e2e/singlecard/test_new_feature.py::test_specific_case
      estimated_time: 60

2. Automatic Partitioning Mechanism

To speed up CI execution, we support splitting large test suites into multiple parallel Jobs (partitions). The partitioning logic is primarily implemented in the auto_partition function in .github/workflows/scripts/run_suite.py.

Principle

The partitioning algorithm uses a Greedy Approach to achieve load balancing, aiming to make the total estimated runtime of each partition as equal as possible.

  1. Read Configuration: The script reads all non-skipped test cases and their estimated_time from config.yaml.
  2. Sort(Balanced Assignment): Test cases are sorted by estimated_time in descending order. This ensures that the heaviest tasks are distributed first to achieve optimal load balancing across partitions.
  3. Assign: Iterating through the sorted test cases, each case is assigned to the partition (Bucket) with the current minimum total time.
  4. Re-sort (Fast Feedback): Within each partition, tests are re-sorted by estimated_time in ascending order. This allows the CI to cover as many test cases as possible in the early stages.

    TIP: If you need to prioritize a new test case, you can temporarily set its estimated_time to 0 to ensure it runs first, then update it to the actual value later.

How to Modify Partitioning Logic

If you need to adjust the partitioning strategy, please modify the .github/workflows/scripts/run_suite.py file.

  • Algorithm Location: auto_partition function.
  • Input Parameters:
    • files: List of test files (including estimated_time).
    • rank: Index of the current partition (0 to size-1).
    • size: Total number of partitions.
  • Invocation: CI workflows (e.g., .github/workflows/_e2e_test.yaml) call the script via command-line arguments:
    python3 .github/workflows/scripts/run_suite.py --suite <suite_name> --auto-partition-id <index> --auto-partition-size <total_count>
    

Notes

  • Accurate Estimated Time: To achieve the best load balancing, please provide an accurate estimated_time in config.yaml. If a new test is very time-consuming but the estimated time is set too low, it may cause a specific partition to timeout.
  • Number of Partitions: The number of partitions (auto-partition-size) is typically defined in the strategy.matrix of the GitHub Actions workflow definition file (e.g., _e2e_test.yaml).

3. Running Tests Locally

You can use the run_suite.py script to run test suites locally:

# Run the full e2e-singlecard suite
python3 .github/workflows/scripts/run_suite.py --suite e2e-singlecard

# Simulate partitioned execution (e.g., partition 0 of 2)
python3 .github/workflows/scripts/run_suite.py --suite e2e-singlecard --auto-partition-id 0 --auto-partition-size 2