vllm_ascend/tests/e2e/310p · yilunh/vllm_ascend - AtomGit

GGitHub[BugFix][310p] Fixing the aclgraph error caused by blocktable (#8948 )

文件	最后提交记录	最后更新时间
data	[Test] Add e2e test cases for the Qwen-VL model adaptation to Ascend 310p (#6977) ### What this PR does / why we need it? Add e2e test cases for the Qwen-VL model adaptation to Ascend 310p - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/15d76f74e2fdb12a95ea00f0ca283acf6219a2b7 Signed-off-by: gcw_61wqY8cy <wanghengkang1@huawei.com>	2 个月前
multicard	[CI][310P] Optimize multicard E2E light (#8489) ## What this PR does / why we need it? This PR optimizes 310p multicard E2E CI runtime by trimming the E2E-Light scope and selecte cases to move to E2E-Full. - Remove two 310p multicard test cases: - `test_qwen3_moe_ep4_fp16` - `test_qwen3_vl_32b_tp1_fp16` - Split remaining 310p multicard coverage between light and full: - Keep light on a strict nodeid whitelist (4 cases) - Will move the following 2 cases to full CI: - `test_qwen3_dense_tp2_fp16` - `test_qwen3_moe_tp2_w8a8` This keeps key 310p multicard signal in light while reducing PR feedback time. ## Does this PR introduce any user-facing change? No. This change only affects CI test selection/scheduling and test definitions. ## How was this patch tested? - Verified workflow logic and trigger conditions in: - `.github/workflows/_e2e_test.yaml` - Verified 310p multicard light job now runs nodeid whitelist: - `test_qwen3_dense_tp4_w8a8` - `test_qwen3_moe_tp4_fp16` - `test_qwen3_5_moe_tp4_fp16` - `test_qwen3_vl_8b_tp2_fp16` - Confirmed removed test functions are deleted from 310p multicard test files. - Lint check on modified workflow/test files passed (no new lint errors). - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 --------- Signed-off-by: csoulnd <daidaicurry@foxmail.com>	1 个月前
singlecard	[BugFix][310p] Fixing the aclgraph error caused by blocktable (#8948) ### What this PR does / why we need it? This PR fixes an ACL Graph error on Ascend 310P devices by moving the block table's slot mapping computation to the CPU. On 310P, certain device-side arithmetic operations used in the default slot mapping computation are unsupported or cause errors during graph execution. Key changes: - Overrode `BlockTable` for 310P to use NumPy for slot mapping computation. - Updated `NPUModelRunner` to perform this computation on the CPU early in the input preparation phase. - Avoided unsupported device-side additions for `positions` and `seq_lens` on 310P by using CPU buffers. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Verified on Ascend 310P hardware with vLLM v0.19.1. - vLLM version: v0.19.1 - vLLM main: https://github.com/vllm-project/vllm/commit/d886c26d4d4fef7d079696beb4ece1cfb4b008a8 --------- Signed-off-by: Tflowers-0129 <2906339855@qq.com>	9 天前
test_utils.py	[Lint]Style: Convert `test/` to ruff format(Batch #1) (#6738) ### What this PR does / why we need it? Scope of Changes: \| File Path \| \| :--- \| \| `tests/e2e/310p/multicard/test_vl_model_multicard.py` \| \| `tests/e2e/310p/singlecard/test_vl_model_singlecard.py` \| \| `tests/e2e/310p/test_utils.py` \| \| `tests/e2e/conftest.py` \| \| `tests/e2e/model_utils.py` \| \| `tests/e2e/models/conftest.py` \| \| `tests/e2e/models/test_lm_eval_correctness.py` \| \| `tests/e2e/multicard/2-cards/spec_decode/test_spec_decode.py` \| \| `tests/e2e/multicard/2-cards/test_aclgraph_capture_replay.py` \| \| `tests/e2e/multicard/2-cards/test_data_parallel.py` \| \| `tests/e2e/multicard/2-cards/test_disaggregated_encoder.py` \| \| `tests/e2e/multicard/2-cards/test_expert_parallel.py` \| \| `tests/e2e/multicard/2-cards/test_external_launcher.py` \| \| `tests/e2e/multicard/2-cards/test_full_graph_mode.py` \| \| `tests/e2e/multicard/2-cards/test_ilama_lora_tp2.py` \| \| `tests/e2e/multicard/2-cards/test_offline_inference_distributed.py` \| \| `tests/e2e/multicard/2-cards/test_offline_weight_load.py` \| \| `tests/e2e/multicard/2-cards/test_pipeline_parallel.py` \| \| `tests/e2e/multicard/2-cards/test_prefix_caching.py` \| \| `tests/e2e/multicard/2-cards/test_quantization.py` \| \| `tests/e2e/multicard/2-cards/test_qwen3_moe.py` \| \| `tests/e2e/multicard/2-cards/test_qwen3_moe_routing_replay.py` \| \| `tests/e2e/multicard/2-cards/test_qwen3_performance.py` \| \| `tests/e2e/multicard/2-cards/test_shared_expert_dp.py` \| \| `tests/e2e/multicard/2-cards/test_single_request_aclgraph.py` \| \| `tests/e2e/multicard/2-cards/test_sp_pass.py` \| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007 Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2 个月前