vllm_ascend/tests/e2e/singlecard/model_runner_v2 · yilunh/vllm_ascend - AtomGit

文件	最后提交记录	最后更新时间
__init__.py	[Feature] support eager mode in model runner v2 (#5210) ### What this PR does / why we need it? #5051 only implement a basic framework for model runner v2, but there are still some bugs for e2e functionality, this PR aim to enable basic functionality. model runner v2 plans: https://github.com/vllm-project/vllm-ascend/issues/5208 - vLLM version: release/v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 --------- Signed-off-by: Ronald1995 <ronaldautomobile@163.com>	4 个月前
test_basic.py	[CI] Main2main 0514 (#9155) ### What this PR does / why we need it? 1. fix https://github.com/vllm-project/vllm/issues/33322 overwrite `gpu_modelrunner.sync_and_gather_intermediate_tensors`, for the sceniro `pp+sp+tp`, skip scatter the residual for ascend 2. https://github.com/vllm-project/vllm/issues/35520 Adapted to the modifications of `ModelRunner v2` for hybrid attn in interface level, . Todo: Added support for Mamba in ModelRunner in Ascend. any pull_request is welcome 3. https://github.com/vllm-project/vllm/issues/40711 4. https://github.com/vllm-project/vllm/pull/42121 5. https://github.com/vllm-project/vllm/pull/41706 6. https://github.com/vllm-project/vllm/issues/39917 Disable `async_schedule` when `enable_return_routed_experts=True` 7. https://github.com/vllm-project/vllm/pull/41046 8. https://github.com/vllm-project/vllm/pull/41055 9. https://github.com/vllm-project/vllm/pull/41035 10. https://github.com/vllm-project/vllm/pull/42434 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.1 - vLLM main: https://github.com/vllm-project/vllm/commit/c7aa186d67b6f051680831418e957c67f34ba7a2 --------- Signed-off-by: wangli <wangli858794774@gmail.com>	15 天前