文件最后提交记录最后更新时间
[BugFix][310p] Fixing the aclgraph error caused by blocktable (#8948) ### What this PR does / why we need it? This PR fixes an ACL Graph error on Ascend 310P devices by moving the block table's slot mapping computation to the CPU. On 310P, certain device-side arithmetic operations used in the default slot mapping computation are unsupported or cause errors during graph execution. Key changes: - Overrode BlockTable for 310P to use NumPy for slot mapping computation. - Updated NPUModelRunner to perform this computation on the CPU early in the input preparation phase. - Avoided unsupported device-side additions for positions and seq_lens on 310P by using CPU buffers. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Verified on Ascend 310P hardware with vLLM v0.19.1. - vLLM version: v0.19.1 - vLLM main: https://github.com/vllm-project/vllm/commit/d886c26d4d4fef7d079696beb4ece1cfb4b008a8 --------- Signed-off-by: Tflowers-0129 <2906339855@qq.com>9 天前
[CI] Main2main upgrade to 0324 (#7787) ### What this PR does / why we need it? main2main upgrade to vllm 0324. fix breaks: 1. PR [#37487](https://github.com/vllm-project/vllm/pull/37487) [V0 Deprecation] Refactor kv cache from list to element (c59a132f9) — self.kv_cache from list[tensor](per virtual engine)changed to tensor 2. PR [#37874](https://github.com/vllm-project/vllm/pull/37874) [KV Offload] Refactor CPU offloading: pluggable CachePolicy, remove Backend abstraction, restructure into cpu/ package (e3c6c10ca) — LRUOffloadingManager + CPUBackend been refactor to CPUOffloadingManager 3. PR [#32951](https://github.com/vllm-project/vllm/pull/32951) [Async][Spec Decoding] Zero-bubble async scheduling + spec decoding (fafe76b4a) — a) changes self.positions and self.seq_lens from CpuGpuBuffer to plain GPU tensor; b) change _get_cumsum_and_arange output paramter. Another _prepare_input_ids add num_reqs. 5. PR [#35007](https://github.com/vllm-project/vllm/pull/35007)[Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unknown env var warning (dc6908ac6) — delete vllm_is_batch_invariant() and const variable VLLM_BATCH_INVARIANT,replace with vllm.envs Know issues: 1.310p Qwen3.5 test failed for qwen3.5 patch failure, see issue: #7976 @YangShuai52 is fixing. ### Does this PR introduce _any_ user-facing change? 1. As Zero Async Scheduler + spec decode needs _compute_slot_mapping_kernel of NPU and corresponding accepted draft token validation delaye suppots see PR #7640 , this PR make this change: when in spec decode case close the async scheduler. In this way, the Main2Main can be developed in parallel with Spec Decode + Async scheduler, util next release version. Co-Authored-By: zhaomingyu <zhaomingyu13@h-partners.com> wangbj127 <wangbj1207@126.com> SidaoY <1024863041@qq.com> 22dimensions <waitingwind@foxmail.com> - vLLM main: https://github.com/vllm-project/vllm/commit/35141a7eeda941a60ad5a4956670c60fd5a77029 --------- Signed-off-by: 22dimensions <waitingwind@foxmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Signed-off-by: Your Name <you@example.com> Signed-off-by: wangbj127 <wangbj1207@126.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: Claude Code <claude@anthropic.com> Co-authored-by: Claude Code <noreply@anthropic.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: wangbj127 <wangbj1207@126.com>1 个月前