文件最后提交记录最后更新时间
[MM][Doc] Update online serving tutorials for Qwen2-Audio (#3606) ### What this PR does / why we need it? Update online serving tutorials for Qwen2-Audio. Part of https://github.com/vllm-project/vllm-ascend/issues/3508. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: shen-shanshan <467638484@qq.com>6 个月前
[Test] Add initial multi modal cases of Qwen2.5-VL-7B-Instruct for disaggregated encoder (#5301) ### What this PR does / why we need it? This PR adds disaggregated encoder tests for Qwen2.5-VL-7B-Instruct ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test by running ci - vLLM version: release/v0.12.0 --------- Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>3 个月前
[Misc][Upgrade] Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 (#9085) Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>16 天前
[Doc][Misc] Improve readability and fix typos in documentation (#9204) ### What this PR does / why we need it? This PR improves the readability of the documentation by fixing typos, correcting command extensions. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Documentation changes only. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/ce29c26b31d432b1b4bc028c46bb2c3b07a667d8 Signed-off-by: sunshine202600 <sunshine202600@163.com>11 天前
[CI]Fixed the spell check function in typos.toml (#6753) ### What this PR does / why we need it? The incorrect regular expression syntax .*[UE4M3|ue4m3].* actually ignores all words containing any of the following characters: `u, e, 4, m, 3, |` ```yaml extend-ignore-identifiers-re = [".*Unc.*", ".*_thw", ".*UE8M0.*", ".*[UE4M3|ue4m3].*", ".*eles.*", ".*fo.*", ".*ba.*", ".*ot.*", ".*[Tt]h[rR].*"] ``` ===fix===> ```yaml extend-ignore-identifiers-re = [".*Unc.*", ".*_thw", ".*UE8M0.*", ".*(UE4M3|ue4m3]).*", ".*eles.*", ".*fo.*", ".*ba.*", ".*ot.*", ".*[Tt]h[rR].*"] ``` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/9562912cead1f11e8540fb91306c5cbda66f0007 Signed-off-by: MrZ20 <2609716663@qq.com>3 个月前
[Refactor][Misc] Use lazy formatting for log (#8756) ### What this PR does / why we need it? This PR replaces eager log message formatting with lazy logging formatting across the repository. - Converts logger.*(f"...") and logging.*(f"...") calls to lazy %-style logging arguments. - Replaces deprecated logger.warn(...) usage with logger.warning(...). - Adds logger.isEnabledFor(logging.DEBUG) guards for debug logs whose arguments include function or method calls. - Enables Ruff G004 enforcement by removing it from ignore and registering vllm.logger.logger as a logger object. This avoids unnecessary string formatting and expensive argument evaluation when the corresponding log level is disabled. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 --------- Signed-off-by: MrZ20 <2609716663@qq.com>1 个月前
[Quantization][Feature] Support compressed tensors moe w4a8 dynamic weight (#5889) ### What this PR does / why we need it? While using the LLM Compressor quantization tool from the VLLM community to generate quantized weights, the VLLM Ascend engine needs to be adapted to support the compressed tensors quantization format. 1. Support Moe model W4A8 dynamic weight. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df --------- Signed-off-by: LHXuuu <scut_xlh@163.com> Signed-off-by: menogrey <1299267905@qq.com> Co-authored-by: menogrey <1299267905@qq.com>3 个月前
[ModelLoader][Feature] Add rfork support for fast model loading (#7392) ### What this PR does / why we need it? Support an new load format: RFORK For implementation details of this feature, please refer to #7441 ### Does this PR introduce _any_ user-facing change? add an new options for load-format: rfork e.g. ```bash vllm serve /workspace/models/Qwen3-8B --load-format rfork ``` ### How was this patch tested? - vLLM version: v0.17.0 - vLLM main: https://github.com/vllm-project/vllm/commit/4034c3d32e30d01639459edd3ab486f56993876d Signed-off-by: Marck <1412354149@qq.com>2 个月前
[Feature][Debug] Upgrade device-side debug print with launchHostFunc (#8079) ### What this PR does / why we need it? This PR upgrades Ascend debug printing from the old acl_graph_print path to a new launchHostFunc-based device_print implementation. The previous design had two major issues: - it could hit a deadlock path like `main thread -> device sync -> callback -> GIL -> main thread` - it fails under Dynamo / torch.compile To address that, this PR adds a custom-op-based device print path that can be preserved through compile and graph execution, and removes the older graph-only helper. - add device_print(str) and device_print_tensor(Tensor) custom ops in csrc/torch_binding.cpp - add Meta registrations for the new print ops in csrc/torch_binding_meta.cpp - mark the print ops as side-effectful in vllm_ascend/utils.py so FX/Inductor does not drop or reorder them - remove acl_graph_print and its old stream subscription / cleanup path from vllm_ascend/utils.py - keep device_print as the single Python debugging helper, with single-argument semantics - add examples/device_print_demo.py to validate eager, torch.compile(backend="aot_eager"), and torch.npu.graph replay - prefer CANN ACL headers over torch_npu's bundled ACL headers so aclrtLaunchHostFunc ### Does this PR introduce _any_ user-facing change? Yes, for developers debugging Ascend execution: - acl_graph_print is removed - device_print(...) becomes the supported debug helper - the new helper is intended to work in eager mode, torch.compile, and NPU graph replay ### How was this patch tested? - added manual validation through examples/device_print_demo.py - vLLM version: - vLLM main: https://github.com/vllm-project/vllm/commit/v0.19.0 --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>1 个月前
[Lint]Style: Convert example to ruff format (#5863) ### What this PR does / why we need it? This PR fixes linting issues in the example/ to align with the project's Ruff configuration. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>4 个月前
[Refactor]Refactor of vllm_ascend/distributed module (#5719) ### What this PR does / why we need it? Based on the RFC:https://github.com/vllm-project/vllm-ascend/issues/5604 This PR is a refactoring of vllm_ascend/distributed, moving all kv_transfer realtaed codes into a dedicated folder, which has already been done in vLLM ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/2f4e6548efec402b913ffddc8726230d9311948d --------- Signed-off-by: lty <linhebiwen@gmail.com>4 个月前
[Lint]Style: Convert example to ruff format (#5863) ### What this PR does / why we need it? This PR fixes linting issues in the example/ to align with the project's Ruff configuration. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>4 个月前
[Lint]Style: Convert example to ruff format (#5863) ### What this PR does / why we need it? This PR fixes linting issues in the example/ to align with the project's Ruff configuration. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>4 个月前
[Lint]Style: Convert example to ruff format (#5863) ### What this PR does / why we need it? This PR fixes linting issues in the example/ to align with the project's Ruff configuration. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>4 个月前
[Doc][Misc] Add metrics usage documentation and example (#6962) ## What this PR does / why we need it? This PR addresses issue #5027 where users find that output.metrics returns None when using the vLLM offline inference API. **Root Cause**: vLLM disables log stats by default (disable_log_stats=True), which causes output.metrics to be None. **Changes**: 1. Added a NOTE comment in examples/offline_inference_npu.py explaining how to enable metrics 2. Created a new example examples/offline_inference_metrics.py demonstrating how to access request-level metrics (first_token_time, finished_time, etc.) by setting disable_log_stats=False ## Does this PR introduce _any_ user-facing change? Yes - adds documentation and example code to help users understand how to access output metrics. ## How was this patch tested? - Documentation/example change only - Verified example code follows the same patterns as existing examples Closes #5027 - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/15d76f74e2fdb12a95ea00f0ca283acf6219a2b7 Signed-off-by: NJX-njx <3771829673@qq.com>2 个月前
[Doc][Misc] Add metrics usage documentation and example (#6962) ## What this PR does / why we need it? This PR addresses issue #5027 where users find that output.metrics returns None when using the vLLM offline inference API. **Root Cause**: vLLM disables log stats by default (disable_log_stats=True), which causes output.metrics to be None. **Changes**: 1. Added a NOTE comment in examples/offline_inference_npu.py explaining how to enable metrics 2. Created a new example examples/offline_inference_metrics.py demonstrating how to access request-level metrics (first_token_time, finished_time, etc.) by setting disable_log_stats=False ## Does this PR introduce _any_ user-facing change? Yes - adds documentation and example code to help users understand how to access output metrics. ## How was this patch tested? - Documentation/example change only - Verified example code follows the same patterns as existing examples Closes #5027 - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/15d76f74e2fdb12a95ea00f0ca283acf6219a2b7 Signed-off-by: NJX-njx <3771829673@qq.com>2 个月前
[Lint]Style: Convert example to ruff format (#5863) ### What this PR does / why we need it? This PR fixes linting issues in the example/ to align with the project's Ruff configuration. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>4 个月前
[Lint]Style: Convert example to ruff format (#5863) ### What this PR does / why we need it? This PR fixes linting issues in the example/ to align with the project's Ruff configuration. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>4 个月前
[Lint]Style: Convert example to ruff format (#5863) ### What this PR does / why we need it? This PR fixes linting issues in the example/ to align with the project's Ruff configuration. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>4 个月前
[Lint]Style: Convert example to ruff format (#5863) ### What this PR does / why we need it? This PR fixes linting issues in the example/ to align with the project's Ruff configuration. - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>4 个月前
[Misc]update vllm to v0.19.1 (#8448) ### What this PR does / why we need it? 1. update transformers to v5.5.3 1.1 The lm-eval package needs to be upgraded to v0.4.11; otherwise, there will be interface incompatibility 1.2 Transformers 5 drops add_bos_token/add_eos_token when a tokenizer_file is present, while TokenizersBackend defaults add_bos_token=False, so DeepSeek string prompts no longer get BOS injected automatically and TP/EP or golden outputs diverge. See [tokenization_utils_base.py#L1783-L1785](https://github.com/huggingface/transformers/blob/ded2b747bde5e9933c140c29ca3615d759f5744d/src/transformers/tokenization_utils_base.py#L1783-L1785) and [tokenization_utils_tokenizers.py#L417-L419](https://github.com/huggingface/transformers/blob/ded2b747bde5e9933c140c29ca3615d759f5744d/src/transformers/tokenization_utils_tokenizers.py#L417-L419).This PR updates the corresponding golden values for the affected DeepSeek test cases. 2. fix WAITING_FOR_FSM error by https://github.com/vllm-project/vllm/pull/38048 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>1 个月前
[Misc]update vllm to v0.19.1 (#8448) ### What this PR does / why we need it? 1. update transformers to v5.5.3 1.1 The lm-eval package needs to be upgraded to v0.4.11; otherwise, there will be interface incompatibility 1.2 Transformers 5 drops add_bos_token/add_eos_token when a tokenizer_file is present, while TokenizersBackend defaults add_bos_token=False, so DeepSeek string prompts no longer get BOS injected automatically and TP/EP or golden outputs diverge. See [tokenization_utils_base.py#L1783-L1785](https://github.com/huggingface/transformers/blob/ded2b747bde5e9933c140c29ca3615d759f5744d/src/transformers/tokenization_utils_base.py#L1783-L1785) and [tokenization_utils_tokenizers.py#L417-L419](https://github.com/huggingface/transformers/blob/ded2b747bde5e9933c140c29ca3615d759f5744d/src/transformers/tokenization_utils_tokenizers.py#L417-L419).This PR updates the corresponding golden values for the affected DeepSeek test cases. 2. fix WAITING_FOR_FSM error by https://github.com/vllm-project/vllm/pull/38048 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 --------- Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>1 个月前
[Doc] Fix documentation formatting and improve code examples (#8660) ### What this PR does / why we need it? This PR fixes various documentation issues and improves code examples throughout the project. - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 --------- Signed-off-by: MrZ20 <2609716663@qq.com>1 个月前
[Feat] [310p] Support w8a8sc quantization method (#7075) ### What this PR does / why we need it? New Quantization Method: Introduced support for the W8A8SC static linear quantization scheme specifically for 310P hardware, enabling more efficient model compression. Refactored the save_sharded_state_310.py to avoid multi-process issue. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? W8A8SC quant E2E test. - vLLM version: v0.16.0 - vLLM main: https://github.com/vllm-project/vllm/commit/4034c3d32e30d01639459edd3ab486f56993876d --------- Signed-off-by: pu-zhe <zpuaa@outlook.com>2 个月前