文件最后提交记录最后更新时间
[Feature] support eager mode in model runner v2 (#5210) ### What this PR does / why we need it? #5051 only implement a basic framework for model runner v2, but there are still some bugs for e2e functionality, this PR aim to enable basic functionality. model runner v2 plans: https://github.com/vllm-project/vllm-ascend/issues/5208 - vLLM version: release/v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 --------- Signed-off-by: Ronald1995 <ronaldautomobile@163.com>4 个月前
[CI] Main2main 0514 (#9155) ### What this PR does / why we need it? 1. fix https://github.com/vllm-project/vllm/issues/33322 overwrite gpu_modelrunner.sync_and_gather_intermediate_tensors, for the sceniro pp+sp+tp, skip scatter the residual for ascend 2. https://github.com/vllm-project/vllm/issues/35520 Adapted to the modifications of ModelRunner v2 for hybrid attn in interface level, . Todo: Added support for Mamba in ModelRunner in Ascend. any pull_request is welcome 3. https://github.com/vllm-project/vllm/issues/40711 4. https://github.com/vllm-project/vllm/pull/42121 5. https://github.com/vllm-project/vllm/pull/41706 6. https://github.com/vllm-project/vllm/issues/39917 Disable async_schedule when enable_return_routed_experts=True 7. https://github.com/vllm-project/vllm/pull/41046 8. https://github.com/vllm-project/vllm/pull/41055 9. https://github.com/vllm-project/vllm/pull/41035 10. https://github.com/vllm-project/vllm/pull/42434 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 --------- Signed-off-by: wangli <wangli858794774@gmail.com>15 天前