vllm_ascend/docs/source/developer_guide · yilunh/vllm_ascend - AtomGit

GGitHub[CI]Remove quantization e2e test case (#9160 )

文件	最后提交记录	最后更新时间
Design_Documents	[BugFix][Doc] Avoid A2 CPU binding overlap from hidden NPUs and doc updates (#8792) ### What this PR does / why we need it? This PR fixes A2 CPU binding pool construction when a worker process only sees part of the logical NPU topology but its cpuset overlaps CPUs affiliated with non-visible NPUs. Also update CPU binding community docs following v0.18.0 version's release. - CPU Binding Logic Improvement: Updated the CPU binding planner to consider all logical NPUs, including non-visible ones, when calculating CPU distribution to prevent potential overlaps in partial-visibility A2 worker environments. - Binding Pool Filtering: Ensured that the final CPU binding pool is strictly limited to visible/running NPUs, while using non-visible NPUs only as a reference to avoid conflicting assignments. - Test Coverage: Added new unit tests to verify that non-running NPUs are correctly skipped during pool construction while still respecting cpuset overlaps. This PR is built over https://github.com/vllm-project/vllm-ascend/pull/8645 while fixing some critical logic defects. Fixes issue #8600. Co-authored with @Rozwel-dx. ### Does this PR introduce _any_ user-facing change? No public API change. For Ascend A2 deployments that use CPU binding with partial NPU visibility, CPU assignment can change to avoid overlap with CPUs associated with non-visible logical NPUs. The final assignment remains limited to the visible/running NPUs for the worker. ### How was this patch tested? E2E test on A2 --------- Signed-off-by: chenchuw886 <chenchuw@huawei.com> Signed-off-by: chenchuw886 <chenchuwei@huawei.com> Co-authored-by: Rozwel-dx <Rozwel-dx@users.noreply.github.com>	29 天前
contribution	[CI]Remove quantization e2e test case (#9160) ### What this PR does / why we need it? 1. Remove quantization e2e test case To reduce the e2e running time, the e2e test cases related to quantization are deleted.The CPU UT and NPU UT of the quantization module have been used to maintain the quantization feature. 2.Add a llm-compressor quantization nightly case Added a test case for verifying the accuracy of weights in the llm-compressor format in the nightly test. - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 --------- Signed-off-by: wangkunpeng <1289706727@qq.com>	14 天前
evaluation	[Doc] Fix documentation formatting and improve code examples (#8660) ### What this PR does / why we need it? This PR fixes various documentation issues and improves code examples throughout the project. - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 --------- Signed-off-by: MrZ20 <2609716663@qq.com>	1 个月前
performance_and_debug	[BugFix] msprobe data collection support aclgraph (#8574) ### What this PR does / why we need it? This PR fixes and clarifies `msprobe` dump behavior for Ascend graph mode, with two goals: 1. Avoid dumping dummy-run data - In `model_runner_v1.py`, `dummy_run` now finalizes debugger state with `dump=False`, so warmup/dummy paths do not write dump data to disk. 2. Keep eager/graph debugger invocation compatible - `_finalize_dump_data` now forwards kwargs to `self.debugger.step(kwargs)`. - This keeps compatibility with both: - `PrecisionDebugger.step()` (eager path) - `AclGraphDumper.step(dump=...)` (graph path) 3. Docs alignment for graph-mode config support** - Updated `msprobe_guide.md` support table to reflect graph-mode constraints: - `task`: graph mode supports `statistics` - `step`: graph mode marked unsupported (`×`) - unified table markers with `√/×` - Removed extra explanatory paragraph and moved constraints into the table itself for clarity. --------- Signed-off-by: Tjh-UKN <2559659915@qq.com> Co-authored-by: Yizhou <136800916+yiz-liu@users.noreply.github.com>	16 天前