文件最后提交记录最后更新时间
[CI] cleanup single/multi-card test (#5623) 1. speed up e2e light test. 2. create 2-cards and 4-cards folder in multicard 3. move ops to nightly 4. run test in Alphabetical Order - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/8be6432bdaf6275664d857b1e5e9bf8ed1ce299e Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>4 个月前
[Feat][SP] Suport SP for VL MoE models (#7044) ### What this PR does / why we need it? 2nd PR for https://github.com/vllm-project/vllm-ascend/issues/5712, extend SP to VL MoE models. ### Does this PR introduce _any_ user-facing change? remove sp_threshold in additional config and reuse sp_min_token_num from vLLM. ### How was this patch tested? - Model: Qwen3-VL-30B-A3B, - TP4 DP2 - 100 reqs - max concurrency 1 | Seq length | Mean TTFT (ms) main | Mean TTFT (ms) this PR | |------------|---------------------|------------------------| | 4k | 429.40 | 323.3 | | 16k | 1297.01 | 911.74 | - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/4034c3d32e30d01639459edd3ab486f56993876d --------- Signed-off-by: realliujiaxu <realliujiaxu@163.com>2 个月前
[Lint]Style: Convert test/ to ruff format(Batch #5) (#6747) ### What this PR does / why we need it? | File Path | | :--- | | tests/e2e/singlecard/compile/backend.py | | tests/e2e/singlecard/compile/test_graphex_norm_quant_fusion.py | | tests/e2e/singlecard/compile/test_graphex_qknorm_rope_fusion.py | | tests/e2e/singlecard/compile/test_norm_quant_fusion.py | | tests/e2e/singlecard/model_runner_v2/test_basic.py | | tests/e2e/singlecard/test_aclgraph_accuracy.py | | tests/e2e/singlecard/test_aclgraph_batch_invariant.py | | tests/e2e/singlecard/test_aclgraph_mem.py | | tests/e2e/singlecard/test_async_scheduling.py | | tests/e2e/singlecard/test_auto_fit_max_mode_len.py | | tests/e2e/singlecard/test_batch_invariant.py | | tests/e2e/singlecard/test_camem.py | | tests/e2e/singlecard/test_completion_with_prompt_embeds.py | | tests/e2e/singlecard/test_cpu_offloading.py | | tests/e2e/singlecard/test_guided_decoding.py | | tests/e2e/singlecard/test_ilama_lora.py | | tests/e2e/singlecard/test_llama32_lora.py | | tests/e2e/singlecard/test_models.py | | tests/e2e/singlecard/test_multistream_overlap_shared_expert.py | | tests/e2e/singlecard/test_quantization.py | | tests/e2e/singlecard/test_qwen3_multi_loras.py | | tests/e2e/singlecard/test_sampler.py | | tests/e2e/singlecard/test_vlm.py | | tests/e2e/singlecard/test_xlite.py | | tests/e2e/singlecard/utils.py | ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007 --------- Signed-off-by: MrZ20 <2609716663@qq.com>3 个月前
[Misc] Upgrade torch-npu to 2.10.0 (#9128) ### What this PR does / why we need it? [Misc] Upgrade torch-npu to 2.10.0 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wxsIcey <1790571317@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>15 天前
[Main2Main] Upgrade vLLM to 0226 (#6813) ### What this PR does / why we need it? Breaking: 1. https://github.com/vllm-project/vllm/pull/33452 2. https://github.com/vllm-project/vllm/pull/33451 3. https://github.com/vllm-project/vllm/pull/32567 4. https://github.com/vllm-project/vllm/pull/32344 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd --------- Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: MrZ20 <2609716663@qq.com>2 个月前