文件最后提交记录最后更新时间
[Misc][Upgrade] Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 (#9085) Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>16 天前
[Misc][Upgrade] Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 (#9085) Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>16 天前
[Misc][Upgrade] Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 (#9085) Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>16 天前
[CI]Main2main 0515 (#9176) ### What this PR does / why we need it? Upstream PR [vllm-project/vllm#39568](https://github.com/vllm-project/vllm/pull/39568) is a complete rewrite of the routed-experts capture/transport pipeline. It supersedes both: - The original 0.20.2 design — RoutedExpertsCapturer.get_instance() singleton, save_captured_experts(indices=...), shared-memory + fcntl.flock cross-process transport. - The intermediate PR #39917 design — module-level get_global_experts_capturer(), init_routed_experts_capturer_with_shared_cache(), issue_routing_d2h_copy(), extract_routed_experts_for_current_batch(). This API existed in main for only a few days and was never in a stable release; it has been **fully removed**. After the upgrade to vLLM 0515, vllm-ascend faces two API surfaces that are incompatible at the source level: | Aspect | 0.20.2 | main | |---|---|---| | Capturer access | RoutedExpertsCapturer.get_instance() (singleton) | runner.routed_experts_capturer (per-runner instance, no global) | | Per-step clear_buffer | via singleton | via runner attribute | | Per-step D2H + ship | capturer.save_captured_experts(indices=cpu_slot_mapping) (sync, shm write) | runner-managed pinned routed_experts_cpu D2H + RoutedExpertsLists on ModelRunnerOutput.routed_experts | | Output channel | shm/flock to scheduler | ModelRunnerOutput.routed_experts: RoutedExpertsLists (NamedTuple, msgpack + zmq IPC) | | slot_mapping source | slot_mapping.cpu().numpy() saved to self.cpu_slot_mapping | private device snapshot routed_experts_slot_mapping_device, then pinned routed_experts_slot_mapping_cpu | | Layer hook injection | select_experts calls singleton from inside apply() | module.router.set_capture_fn(...) from _bind_routed_experts_capturer | ## Strategy Overview 1. **Keep the 0.20.2 path intact.** It already works end-to-end. All 0.20.2-specific call sites stay byte-identical. 2. **Add a parallel main path** gated by `vllm_version_is("0.20.2") == False. Reuse upstream GPUModelRunner.init_routed_experts_capturer()` (inherited) for buffer allocation; override only _bind_routed_experts_capturer because Ascend's select_experts does not go through upstream BaseRouter. 3. **Async scheduling: piggyback on upstream AsyncGPUModelRunnerOutput.** vllm-ascend already constructs that wrapper directly, so adding the routed_experts= kwarg is enough — the wrapper handles to_cpu_nonblocking() on its copy stream and tolists() finalization in get_output() for free. 4. **No new compat module, no monkey patches.** Branching is inline at each call site; total surface is one new method (_bind_routed_experts_capturer) plus three branched call sites in model_runner_v1.py and one in fused_moe.py. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/ce29c26b31d432b1b4bc028c46bb2c3b07a667d8 --------- Signed-off-by: wangli <wangli858794774@gmail.com>12 天前
[CI] replace mirror with CDN (#9345) ### What this PR does / why we need it? Replace CDN mirror repo. ### Does this PR introduce _any_ user-facing change? uses https://repo.huaweicloud.com/ascend/repos/pypi ### How was this patch tested? Accelerate package download - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: tfhddd <2272751277@qq.com>10 天前
[CI] replace mirror with CDN (#9345) ### What this PR does / why we need it? Replace CDN mirror repo. ### Does this PR introduce _any_ user-facing change? uses https://repo.huaweicloud.com/ascend/repos/pypi ### How was this patch tested? Accelerate package download - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: tfhddd <2272751277@qq.com>10 天前