vllm_ascend/.github/workflows/dockerfiles · yilunh/vllm_ascend - AtomGit

GGitHub[CI] replace mirror with CDN (#9345 )

文件	最后提交记录	最后更新时间
Dockerfile.buildwheel.310p	[Misc][Upgrade] Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 (#9085) Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	16 天前
Dockerfile.buildwheel.a2	[Misc][Upgrade] Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 (#9085) Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	16 天前
Dockerfile.buildwheel.a3	[Misc][Upgrade] Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 (#9085) Upgrade CANN to 9.0.0 and triton-ascend to 3.2.1 - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	16 天前
Dockerfile.lint	[CI]Main2main 0515 (#9176) ### What this PR does / why we need it? Upstream PR [vllm-project/vllm#39568](https://github.com/vllm-project/vllm/pull/39568) is a complete rewrite of the routed-experts capture/transport pipeline. It supersedes both: - The original 0.20.2 design — `RoutedExpertsCapturer.get_instance()` singleton, `save_captured_experts(indices=...)`, shared-memory + `fcntl.flock` cross-process transport. - The intermediate PR #39917 design — module-level `get_global_experts_capturer()`, `init_routed_experts_capturer_with_shared_cache()`, `issue_routing_d2h_copy()`, `extract_routed_experts_for_current_batch()`. This API existed in main for only a few days and was never in a stable release; it has been fully removed. After the upgrade to vLLM 0515, vllm-ascend faces two API surfaces that are incompatible at the source level: \| Aspect \| 0.20.2 \| main \| \|---\|---\|---\| \| Capturer access \| `RoutedExpertsCapturer.get_instance()` (singleton) \| `runner.routed_experts_capturer` (per-runner instance, no global) \| \| Per-step `clear_buffer` \| via singleton \| via runner attribute \| \| Per-step D2H + ship \| `capturer.save_captured_experts(indices=cpu_slot_mapping)` (sync, shm write) \| runner-managed pinned `routed_experts_cpu` D2H + `RoutedExpertsLists` on `ModelRunnerOutput.routed_experts` \| \| Output channel \| shm/flock to scheduler \| `ModelRunnerOutput.routed_experts: RoutedExpertsLists` (NamedTuple, msgpack + zmq IPC) \| \| `slot_mapping` source \| `slot_mapping.cpu().numpy()` saved to `self.cpu_slot_mapping` \| private device snapshot `routed_experts_slot_mapping_device`, then pinned `routed_experts_slot_mapping_cpu` \| \| Layer hook injection \| `select_experts` calls singleton from inside `apply()` \| `module.router.set_capture_fn(...)` from `_bind_routed_experts_capturer` \| ## Strategy Overview 1. Keep the 0.20.2 path intact. It already works end-to-end. All 0.20.2-specific call sites stay byte-identical. 2. Add a parallel main path gated by `vllm_version_is("0.20.2") == False`. Reuse upstream` GPUModelRunner.init_routed_experts_capturer()` (inherited) for buffer allocation; override only `_bind_routed_experts_capturer` because Ascend's `select_experts` does not go through upstream `BaseRouter`. 3. Async scheduling: piggyback on upstream `AsyncGPUModelRunnerOutput`. vllm-ascend already constructs that wrapper directly, so adding the `routed_experts=` kwarg is enough — the wrapper handles `to_cpu_nonblocking()` on its copy stream and `tolists()` finalization in `get_output()` for free. 4. No new compat module, no monkey patches. Branching is inline at each call site; total surface is one new method (`_bind_routed_experts_capturer`) plus three branched call sites in `model_runner_v1.py` and one in `fused_moe.py`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/ce29c26b31d432b1b4bc028c46bb2c3b07a667d8 --------- Signed-off-by: wangli <wangli858794774@gmail.com>	12 天前
Dockerfile.nightly.a2	[CI] replace mirror with CDN (#9345) ### What this PR does / why we need it? Replace CDN mirror repo. ### Does this PR introduce _any_ user-facing change? uses https://repo.huaweicloud.com/ascend/repos/pypi ### How was this patch tested? Accelerate package download - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: tfhddd <2272751277@qq.com>	10 天前
Dockerfile.nightly.a3	[CI] replace mirror with CDN (#9345) ### What this PR does / why we need it? Replace CDN mirror repo. ### Does this PR introduce _any_ user-facing change? uses https://repo.huaweicloud.com/ascend/repos/pypi ### How was this patch tested? Accelerate package download - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: tfhddd <2272751277@qq.com>	10 天前