文件最后提交记录最后更新时间
[Doc] add Mixtral-8x7B-Instruct-v0.1 model docs and config (#8537) ### What this PR does / why we need it? This PR improves the scheduler profiling behavior for mixtral workloads by refining chunk handling logic. Previously, the profiling process could lead to inaccurate scheduling results under certain conditions. This change ensures more stable and consistent behavior. - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 --------- Signed-off-by: lihaofei-2026 <haofei@isrc.iscas.ac.cn>9 天前
[Lint]Style: Convert test/ to ruff format(Batch #1) (#6738) ### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | | tests/e2e/310p/multicard/test_vl_model_multicard.py | | tests/e2e/310p/singlecard/test_vl_model_singlecard.py | | tests/e2e/310p/test_utils.py | | tests/e2e/conftest.py | | tests/e2e/model_utils.py | | tests/e2e/models/conftest.py | | tests/e2e/models/test_lm_eval_correctness.py | | tests/e2e/multicard/2-cards/spec_decode/test_spec_decode.py | | tests/e2e/multicard/2-cards/test_aclgraph_capture_replay.py | | tests/e2e/multicard/2-cards/test_data_parallel.py | | tests/e2e/multicard/2-cards/test_disaggregated_encoder.py | | tests/e2e/multicard/2-cards/test_expert_parallel.py | | tests/e2e/multicard/2-cards/test_external_launcher.py | | tests/e2e/multicard/2-cards/test_full_graph_mode.py | | tests/e2e/multicard/2-cards/test_ilama_lora_tp2.py | | tests/e2e/multicard/2-cards/test_offline_inference_distributed.py | | tests/e2e/multicard/2-cards/test_offline_weight_load.py | | tests/e2e/multicard/2-cards/test_pipeline_parallel.py | | tests/e2e/multicard/2-cards/test_prefix_caching.py | | tests/e2e/multicard/2-cards/test_quantization.py | | tests/e2e/multicard/2-cards/test_qwen3_moe.py | | tests/e2e/multicard/2-cards/test_qwen3_moe_routing_replay.py | | tests/e2e/multicard/2-cards/test_qwen3_performance.py | | tests/e2e/multicard/2-cards/test_shared_expert_dp.py | | tests/e2e/multicard/2-cards/test_single_request_aclgraph.py | | tests/e2e/multicard/2-cards/test_sp_pass.py | ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007 Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>2 个月前
[Test] Add ASR model accuracy test (#8362) ### What this PR does / why we need it? Adds an end-to-end accuracy evaluation framework for ASR (Automatic Speech Recognition) models on Ascend NPU, along with a configuration for Qwen3-ASR-1.7B. - **New test file** tests/e2e/models/test_asr_eval_correctness.py: A pytest-based ASR accuracy evaluator that: - Starts a vllm serve instance via RemoteOpenAIServer - Transcribes audio samples using the /v1/audio/transcriptions API - Normalises transcriptions with Whisper's EnglishTextNormalizer - Computes WER with jiwer and validates against ground-truth thresholds (10% relative tolerance) - Generates a Markdown accuracy report using the shared report_template.md - **CI integration** .github/workflows/_e2e_nightly_single_node_models.yaml: Wired the new ASR test into the nightly single-node pipeline. - **Unified YAML format**: Standardised hardware, serve, tasks, limit, batch_size fields across all existing accuracy test configs for consistency. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/5af684c31912232e5c89484c2e8259e0fac6c55b --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>30 天前
[Test] Add ASR model accuracy test (#8362) ### What this PR does / why we need it? Adds an end-to-end accuracy evaluation framework for ASR (Automatic Speech Recognition) models on Ascend NPU, along with a configuration for Qwen3-ASR-1.7B. - **New test file** tests/e2e/models/test_asr_eval_correctness.py: A pytest-based ASR accuracy evaluator that: - Starts a vllm serve instance via RemoteOpenAIServer - Transcribes audio samples using the /v1/audio/transcriptions API - Normalises transcriptions with Whisper's EnglishTextNormalizer - Computes WER with jiwer and validates against ground-truth thresholds (10% relative tolerance) - Generates a Markdown accuracy report using the shared report_template.md - **CI integration** .github/workflows/_e2e_nightly_single_node_models.yaml: Wired the new ASR test into the nightly single-node pipeline. - **Unified YAML format**: Standardised hardware, serve, tasks, limit, batch_size fields across all existing accuracy test configs for consistency. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/5af684c31912232e5c89484c2e8259e0fac6c55b --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>30 天前
[Test] Add ASR model accuracy test (#8362) ### What this PR does / why we need it? Adds an end-to-end accuracy evaluation framework for ASR (Automatic Speech Recognition) models on Ascend NPU, along with a configuration for Qwen3-ASR-1.7B. - **New test file** tests/e2e/models/test_asr_eval_correctness.py: A pytest-based ASR accuracy evaluator that: - Starts a vllm serve instance via RemoteOpenAIServer - Transcribes audio samples using the /v1/audio/transcriptions API - Normalises transcriptions with Whisper's EnglishTextNormalizer - Computes WER with jiwer and validates against ground-truth thresholds (10% relative tolerance) - Generates a Markdown accuracy report using the shared report_template.md - **CI integration** .github/workflows/_e2e_nightly_single_node_models.yaml: Wired the new ASR test into the nightly single-node pipeline. - **Unified YAML format**: Standardised hardware, serve, tasks, limit, batch_size fields across all existing accuracy test configs for consistency. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/5af684c31912232e5c89484c2e8259e0fac6c55b --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>30 天前
[Test] Add reward model accuracy evaluation test (#8388) ### What this PR does / why we need it? Reward model (RM) inference correctness on Ascend NPU has no automated accuracy gate. A silent regression in score distribution could go undetected across releases. - Add tests/e2e/models/test_rm_eval_correctness.py: end-to-end accuracy test that: - Loads a dataset (default: GSM8K) via ModelScope - Runs the reward model with VllmRunner.reward() - Verifies that correct solutions score higher than perturbed (wrong) solutions on ≥ declared ground-truth accuracy (±5 % RTOL) - Reports environment metadata (vllm/CANN/torch versions) for traceability - Add tests/e2e/models/configs/Qwen2.5-Math-RM-72B.yaml: model config for Qwen2.5-Math-RM-72B (TP=8, Atlas A2, gsm8k accuracy threshold = 0.80) - Wire the new test into .github/workflows/_e2e_nightly_single_node_models.yaml ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com>30 天前