文件最后提交记录最后更新时间
[Test] Add e2e test cases for the Qwen-VL model adaptation to Ascend 310p (#6977) ### What this PR does / why we need it? Add e2e test cases for the Qwen-VL model adaptation to Ascend 310p - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/15d76f74e2fdb12a95ea00f0ca283acf6219a2b7 Signed-off-by: gcw_61wqY8cy <wanghengkang1@huawei.com>2 个月前
[CI][310P] Optimize multicard E2E light (#8489) ## What this PR does / why we need it? This PR optimizes 310p multicard E2E CI runtime by trimming the E2E-Light scope and selecte cases to move to E2E-Full. - Remove two 310p multicard test cases: - test_qwen3_moe_ep4_fp16 - test_qwen3_vl_32b_tp1_fp16 - Split remaining 310p multicard coverage between light and full: - Keep light on a strict nodeid whitelist (4 cases) - Will move the following 2 cases to full CI: - test_qwen3_dense_tp2_fp16 - test_qwen3_moe_tp2_w8a8 This keeps key 310p multicard signal in light while reducing PR feedback time. ## Does this PR introduce any user-facing change? No. This change only affects CI test selection/scheduling and test definitions. ## How was this patch tested? - Verified workflow logic and trigger conditions in: - .github/workflows/_e2e_test.yaml - Verified 310p multicard light job now runs nodeid whitelist: - test_qwen3_dense_tp4_w8a8 - test_qwen3_moe_tp4_fp16 - test_qwen3_5_moe_tp4_fp16 - test_qwen3_vl_8b_tp2_fp16 - Confirmed removed test functions are deleted from 310p multicard test files. - Lint check on modified workflow/test files passed (no new lint errors). - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 --------- Signed-off-by: csoulnd <daidaicurry@foxmail.com>1 个月前
[BugFix][310p] Fixing the aclgraph error caused by blocktable (#8948) ### What this PR does / why we need it? This PR fixes an ACL Graph error on Ascend 310P devices by moving the block table's slot mapping computation to the CPU. On 310P, certain device-side arithmetic operations used in the default slot mapping computation are unsupported or cause errors during graph execution. Key changes: - Overrode BlockTable for 310P to use NumPy for slot mapping computation. - Updated NPUModelRunner to perform this computation on the CPU early in the input preparation phase. - Avoided unsupported device-side additions for positions and seq_lens on 310P by using CPU buffers. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Verified on Ascend 310P hardware with vLLM v0.19.1. - vLLM version: v0.19.1 - vLLM main: https://github.com/vllm-project/vllm/commit/d886c26d4d4fef7d079696beb4ece1cfb4b008a8 --------- Signed-off-by: Tflowers-0129 <2906339855@qq.com>9 天前
[Lint]Style: Convert test/ to ruff format(Batch #1) (#6738) ### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | | tests/e2e/310p/multicard/test_vl_model_multicard.py | | tests/e2e/310p/singlecard/test_vl_model_singlecard.py | | tests/e2e/310p/test_utils.py | | tests/e2e/conftest.py | | tests/e2e/model_utils.py | | tests/e2e/models/conftest.py | | tests/e2e/models/test_lm_eval_correctness.py | | tests/e2e/multicard/2-cards/spec_decode/test_spec_decode.py | | tests/e2e/multicard/2-cards/test_aclgraph_capture_replay.py | | tests/e2e/multicard/2-cards/test_data_parallel.py | | tests/e2e/multicard/2-cards/test_disaggregated_encoder.py | | tests/e2e/multicard/2-cards/test_expert_parallel.py | | tests/e2e/multicard/2-cards/test_external_launcher.py | | tests/e2e/multicard/2-cards/test_full_graph_mode.py | | tests/e2e/multicard/2-cards/test_ilama_lora_tp2.py | | tests/e2e/multicard/2-cards/test_offline_inference_distributed.py | | tests/e2e/multicard/2-cards/test_offline_weight_load.py | | tests/e2e/multicard/2-cards/test_pipeline_parallel.py | | tests/e2e/multicard/2-cards/test_prefix_caching.py | | tests/e2e/multicard/2-cards/test_quantization.py | | tests/e2e/multicard/2-cards/test_qwen3_moe.py | | tests/e2e/multicard/2-cards/test_qwen3_moe_routing_replay.py | | tests/e2e/multicard/2-cards/test_qwen3_performance.py | | tests/e2e/multicard/2-cards/test_shared_expert_dp.py | | tests/e2e/multicard/2-cards/test_single_request_aclgraph.py | | tests/e2e/multicard/2-cards/test_sp_pass.py | ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007 Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>2 个月前