vllm_ascend:基于 vLLM 社区的 Ascend NPU 后端插件项目

可在Ascend NPU上无缝运行类Transformer、MOE、嵌入、多模态等大语言模型,提升模型微调、评估、强化学习及部署体验。是vLLM社区推荐的昇腾后端支持方式,遵循硬件可插拔原则。【此简介由AI生成】

分支2Tags0
文件最后提交记录最后更新时间
[Doc][skill] Rework main2main skill and add deterministic automation scripts (#9232) ### What this PR does / why we need it? This PR reworks the main2main skill into a structured, script-driven pipeline for keeping vLLM-Ascend aligned with upstream vLLM main. It follows the direction in RFC #7074: make main2main caller-agnostic, script-deterministic, incremental, CI-gated, and bounded. The skill now separates high-level agent guidance from deterministic helper scripts and detailed reference material. New structure: ```text main2main/ ├── SKILL.md │ └── Compact entrypoint: guardrails, workflow overview, and pre-completion checklist. ├── scripts/ │ ├── detect_commits.py │ │ └── Initialize workspace and detect base/target vLLM commits. │ ├── plan_steps.py │ │ └── Deterministic step planner for bounded upstream commit ranges. │ ├── step-planner.yaml │ │ └── Classification, weight, and budget configuration for step planning. │ ├── check_and_commit.py │ │ └── Validate commit paths and create signed commits only after CI passes. │ └── run_main2main_ci.py │ └── Run CI and extract structured log summaries. └── reference/ ├── adapt-guide.md │ └── Detailed method for the Adapt phase. ├── diagnosis-guide.md │ └── Detailed method for the Fix-CI loop. ├── final-summary.md │ └── Final reviewer-facing summary template. └── error-pattern-examples.md └── Concrete examples for common CI error fixes. ``` Key workflow changes: - Splits upstream vLLM commit drift into bounded steps before adaptation. - Updates the pinned vLLM commit reference per step. - Requires CI verification for every step, including no-op adapt steps. - Uses structured CI log summaries instead of reading raw logs into agent context. - Commits only after CI passes or only environment flakes remain. - Stops cleanly on bounded fix-loop failure by saving patch and failure summary instead of silently skipping failed steps. - This reduces manual main2main maintenance by making repeatable operations deterministic and keeping the agent focused on the hard parts: understanding upstream API changes, adapting vLLM-Ascend code, and diagnosing CI failures. Reference: [#7074](https://github.com/vllm-project/vllm-ascend/issues/7074) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: Meihan-chen <zr010426ztt@outlook.com>9 天前
[Doc][Misc] Refactor skill documentation and add Claude support instructions (#6817) ### What this PR does / why we need it? This PR refactors the documentation for vLLM Ascend skills. - It renames and moves the vllm-ascend-model-adapter skill's README to serve as a new top-level README for the .agents directory. - It adds instructions on how to use the Ascend skills with Claude, including a new README in the .claude directory. - It updates .gitignore to exclude skills copied for Claude's use. - Add main2main skill This improves the documentation structure, making it more organized and providing clear instructions for developers using these skills with different tools. ### Does this PR introduce _any_ user-facing change? No, this PR contains only documentation and repository configuration changes. It does not affect any user-facing code functionality. ### How was this patch tested? These changes are documentation-only and do not require specific testing. The correctness of the instructions is being verified through this review. - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/83b47f67b1dfad505606070ae4d9f83e50ad4ebd --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>2 个月前
[CI] update gemini styleguide (#6463) Let gemini always add pr title and commit message summary. - vLLM version: v0.14.1 - vLLM main: https://github.com/vllm-project/vllm/commit/dc917cceb877dfd13f98c538c4c96158047d98bd Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>3 个月前
[CI] add weekly case (#9380) ### What this PR does / why we need it? We run the weekly test case at a fixed period. add weekly case ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by running the test - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: chen-commits <1636718796@qq.com> Signed-off-by: chen <1636718796@qq.com>8 天前
[Doc][Misc] Improve readability and fix typos in documentation (#8266) ### What this PR does / why we need it? This PR improves the readability of the documentation by fixing typos, correcting command extensions, and fixing broken links in the Chinese README. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Documentation changes only. - vLLM version: - vLLM main: https://github.com/vllm-project/vllm/commit/v0.19.0 --------- Signed-off-by: sunshine202600 <sunshine202600@163.com>1 个月前
[Misc] Cleanup useless file and code (#5877) Remove useless file and code - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>4 个月前
[Feature][Model] Switch DeepSeekV4 hc_pre to fused op (#9396) ### What this PR does / why we need it? This PR switches DeepSeekV4 hc_pre from the composite small-op path to the aclnnHcPre-backed npu_hc_pre_v2 interface. It also aligns the runtime hc_pre torch binding contract with the CANN recipe binding while leaving the Meta implementation as shape inference only, so torch compile does not trip over runtime-only checks: - x must be 3D or 4D BF16 - hc_mult / hc must be 4 - d must be 4096 or 7168 - hc_fn must be [24, hc * d] - hc_scale must be [3] - hc_base must be [24] - non-x tensors must be FP32 For Ascend950, npu_hc_pre_v2 follows the CANN recipe's batch filter and falls back to the composite path when bs > 512 and bs is not aligned to 8192. Other SoCs continue to use the fused path. Reference: https://gitcode.com/cann/cann-recipes-infer/blob/master/ops/ascendc/torch_ops_extension/custom_ops/csrc/npu_hc_pre.cpp - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>8 天前
[Doc][Feature] Add issue-workflow-guidelines.md (#8968) ### What this PR does / why we need it? This guideline improves onboarding for new contributors and reduces ambiguity for maintainers when triaging issues. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Check content locally and maintainer can review via github preview, also need check the result of readthedocs CI workflow. · vLLM version: v0.18.0 · vLLM main: https://github.com/vllm-project/vllm/commit/35141a7eeda941a60ad5a4956670c60fd5a77029 --------- Signed-off-by: Tian-Fantasea <tt553093031@gmail.com> Signed-off-by: Tian-Fantasea <Tian-Fantasea@noreply.gitcode.com> Signed-off-by: Tian <tt553093031@gmail.com> Co-authored-by: Tian-Fantasea <Tian-Fantasea@noreply.gitcode.com>8 天前
[Doc][Misc] Improve readability and fix typos in documentation (#9204) ### What this PR does / why we need it? This PR improves the readability of the documentation by fixing typos, correcting command extensions. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Documentation changes only. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/ce29c26b31d432b1b4bc028c46bb2c3b07a667d8 Signed-off-by: sunshine202600 <sunshine202600@163.com>11 天前
[BugFix][CI][310p] Fix CI error for 310p caused by DSV4 (#9402) ### What this PR does / why we need it? Fix 310P online CI issue caused by an extra argument added to blocktable.py in DeepSeek v4. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? CI - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: Tflowers-0129 <2906339855@qq.com>8 天前
[CI] Solve the problems of slow download speed and UV (#9304) ### What this PR does / why we need it? 1. Replace triton-ascend source. 2. Add uv. ### Does this PR introduce _any_ user-facing change? Speed ​​up PR execution. ### How was this patch tested? Check the installation time of vllm-ascend in the pr. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b310 天前
fix: skip elastic_load when sources are empty for this device When a seed instance has d2d_peer_ips=[], the source list contains entries with device_id but empty sources. This caused the netloader to attempt elastic_load (which failed) after initialize_model, leading to double model initialization that peaked at ~64 GiB and caused NPU OOM on 61.27 GiB cards. Now valid_sources filtering also requires non-empty sources list, so seed instances go directly to DefaultModelLoader without the double init penalty. ElasticServer startup is unaffected, so seed can still serve weights to later client instances. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 4 小时前
[Lint]Add lint hooks for clang-format, shellcheck, forbidden imports, and boolean context manager checks (#7511) ### What this PR does / why we need it? This PR introduces several upstream vllm-aligned lint hooks into vllm-ascend and makes them part of the actual pre-commit flow. Main changes in this PR: - add check-boolean-context-manager to catch boolean expressions in with statements - add check-forbidden-imports to forbid direct re imports and disallowed direct triton imports - enable shell script linting through tools/shellcheck.sh - add root .clang-format aligned with upstream vllm, enable clang-format in pre-commit, temporarily **exclude all csrc/**** from clang-format to avoid bringing a large native code reformat into this PR This PR focuses on landing the smaller and immediately useful lint alignment first, without mixing in the larger requirements-management migration. ### Does this PR introduce _any_ user-facing change? No. This PR only updates repository lint configuration, static checks, and internal import/style enforcement. It does not change runtime behavior or public interfaces. ### How was this patch tested? Tested locally in the project virtual environment. Commands used: ```bash bash format.sh ``` Verified checks passed: ``` bash ruff check...............................................................Passed ruff format..............................................................Passed codespell................................................................Passed typos....................................................................Passed clang-format.............................................................Passed Lint GitHub Actions workflow files.......................................Passed Lint shell scripts.......................................................Passed Lint PNG exports from excalidraw.........................................Passed Check for spaces in all filenames........................................Passed Enforce __init__.py in Python packages...................................Passed Check for forbidden imports..............................................Passed Check for boolean ops in with-statements.................................Passed Suggestion...............................................................Passed - hook id: suggestion - duration: 0s To bypass pre-commit hooks, add --no-verify to git commit. ``` **note:** clang-format is enabled but currently excludes all csrc/** - vLLM version: v0.17.0 - vLLM main: https://github.com/vllm-project/vllm/commit/8b6325758cce5f9c36d38f2462edbd368b97a07c --------- Signed-off-by: MrZ20 <2609716663@qq.com>2 个月前
[Feature] Update custom op build framework (#8146) ## Summary - update the custom-op build and packaging framework - align current custom-op integration under csrc while keeping the current main-branch operator implementations - improve custom-op runtime environment bootstrap so single-op tests and offline service no longer depend on manually sourcing vendor env scripts - keep the non-custom-op build path unchanged ## Notes - this PR focuses on the custom-op build framework update and related runtime loading path changes - operator implementations continue to follow the current main-branch codebase - final validation for this PR state relies on the CI results on GitHub ## Validation - remote A2 targeted verification completed for custom-op build/install and runtime bootstrap - full CI is used as the final validation gate for this PR state - vLLM version: - vLLM main: https://github.com/vllm-project/vllm/commit/v0.19.0 Signed-off-by: maoxx241 <maomaoyu870@gmail.com>18 天前
[Performance] add op chunk_fwd_o and chunk_gated_delta_rule_fwd_h (#9018) ### What this PR does / why we need it? add custom op for performance improve: chunk_fwd_o & chunk_gated_delta_rule_fwd_h - vLLM version: v0.19.1 - vLLM main: https://github.com/vllm-project/vllm/commit/4d51588e2381018348f1022dfa3a7698899805b7 Signed-off-by: AlanisZomeg <1308342839@qq.com>17 天前
[Lint]Style: reformat markdown files via markdownlint (#5884) ### What this PR does / why we need it? reformat markdown files via markdownlint - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/bde38c11df0ea066a740efe9b77fff5418be45df --------- Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>4 个月前
[Feature][Doc] Add AI QoS module, tuning tool, and user guide (#8706) ### What this PR does / why we need it? This PR adds **AI QoS** support for operator-facing tuning on Ascend: a **Python tool** to apply/undo and print UB switch–style configuration, **unit tests**, and an **English** user guide with platform and software constraints. - **csrc/ai_qos**: Exposes set_qos / get_qos, set_bw / get_bw, and fuse/global config helpers via **pybind11**; integrated into the build (**CMake** / **setup.py** as applicable in this tree). - **tools/ai_qos.py**: apply to snapshot baseline and program QoS state; unset to restore and remove state; supports auto/manual traffic priorities and prints command for UB switch configuration. - **tests/ut/test_ai_qos_tool.py**: Mocks torch.npu and vllm_ascend.ai_qos; covers device list, first-apply baseline reuse, and unset/restore. - **Docs** (`docs/source/user_guide/feature_guide/AI QoS Introduction_en.md`): Background, Auto/Manual usage, how to disable; **Usage constraints** including: - **AIV H2D / AIV D2D** host QoS: not effective with the current driver stack; delivery planned via module upgrade after driver support lands. - **Software**: **Ascend HDK 26.0.0+**, **LingQu**-based **UB switch** version as listed in the doc table. ### Does this PR introduce _any_ user-facing change? **Yes.** Operators get a new optional pre-inference step (`python tools/ai_qos.py / unset`) and a published English guide with version and **constraint** information. ### How was this patch tested? - pytest -sv tests/ut/test_ai_qos_tool.py (or full `pytest -sv tests/ut` as required by the project) - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 --------- Signed-off-by: gtl <gaotianlong6@h-partners.com> Co-authored-by: gtl <gaotianlong6@h-partners.com>22 天前
[Doc] Add sphinx build for vllm-ascend (#55) ### What this PR does / why we need it? This patch enables the doc build for vllm-ascend - Add sphinx build for vllm-ascend - Enable readthedocs for vllm-ascend - Fix CI: - exclude vllm-empty/tests/mistral_tool_use to skip `You need to agree to share your contact information to access this model` which introduce in https://github.com/vllm-project/vllm/commit/314cfade02b28d50349c4df1a7ea0bbdaef589f1 - Install test req to fix https://github.com/vllm-project/vllm-ascend/actions/runs/13304112758/job/37151690770: ``` vllm-empty/tests/mistral_tool_use/conftest.py:4: in <module> import pytest_asyncio E ModuleNotFoundError: No module named 'pytest_asyncio' ``` - exclude docs PR ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? 1. test locally: ```bash # Install dependencies. pip install -r requirements-docs.txt # Build the docs and preview make clean; make html; python -m http.server -d build/html/ ``` Launch browser and open http://localhost:8000/. 2. CI passed with preview: https://vllm-ascend--55.org.readthedocs.build/en/55/ Signed-off-by: Yikun Jiang <yikunkero@gmail.com>1 年前
[Doc] Align PR title prefix guidance (#8834) ### What this PR does / why we need it? This PR aligns the documented bug-fix PR title prefix with the current CI validator. The docs now use the CI-recognized [BugFix] casing instead of [Bugfix]. Fixes #8840. ### Does this PR introduce _any_ user-facing change? No. This is a documentation-only update. ### How was this patch tested? - git diff --check - `pre-commit run markdownlint --hook-stage manual --files AGENTS.md docs/source/developer_guide/contribution/index.md` - `pre-commit run typos --files AGENTS.md docs/source/developer_guide/contribution/index.md docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/index.po` - vLLM version: v0.19.1 - vLLM main: https://github.com/vllm-project/vllm/commit/d886c26d4d4fef7d079696beb4ece1cfb4b008a8 Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>23 天前
[Doc][Misc] Improve readability and fix typos in documentation (#8266) ### What this PR does / why we need it? This PR improves the readability of the documentation by fixing typos, correcting command extensions, and fixing broken links in the Chinese README. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Documentation changes only. - vLLM version: - vLLM main: https://github.com/vllm-project/vllm/commit/v0.19.0 --------- Signed-off-by: sunshine202600 <sunshine202600@163.com>1 个月前
[Feature][Ops] Add A5 custom operator build support (#9271) ### What this PR does / why we need it? This PR adds the A5 / ascend950 custom operator build and binding support needed by the DeepSeek V4 A5 path. Changes included: - Updates the ascend950 CUSTOM_OPS_ARRAY in csrc/build_aclnn.sh to match the A5 ACLNN branch operator list. - Adds A5-specific custom operator directories for indexer_compress_epilog, indexer_compress_epilog_v2, kv_compress_epilog, kv_quant_sparse_attn_sharedkv, kv_quant_sparse_attn_sharedkv_metadata, load_index_kv_cache, hc_pre, and swiglu_group_quant. - Wires torch and meta registrations for the new A5 operators, including npu_hc_pre_v2 and grouped_matmul_swiglu_quant_weight_nz. - Skips the direct vllm_ascend_kernels target for ascend950 and uses VLLM_ENABLE_ATB_AND_DIRECT_KERNELS to guard direct-kernel includes, schemas, implementations, and meta registrations. This keeps ascend950 and 310P import-safe when those direct kernels are not built. - Maps ascend950 to the newer Ascend950PR_9599 CANN platform name used by the new build framework. - Updates quant_lightning_indexer for A5 PA cache views: it keeps the existing stride / scale_stride schema, avoids forcing 950 key/key-scale tensors contiguous, and uses the passed strides in the arch35 PA offset path. A3 conflict-sensitive note: - The A5 source branch also contains changes under shared A3 operator paths such as compressor, sparse_attn_sharedkv, hc_pre_sinkhorn, hc_pre_inv_rms, hc_post, inplace_partial_rotary_mul, moe_gating_top_k_hash, and fused-MoE Python code. - This PR intentionally does not overwrite those shared implementations. For shared operator names that remain in the ascend950 build list, it uses the current main-branch implementation and only adds the A5-specific operator directories missing from main. - quant_lightning_indexer is the intentional shared-path exception because A5 needs dim0-stride-aware PA cache access. - quant_lightning_indexer_metadata stays on the mainline Ascend950 SoC-version behavior and does not add legacy Ascend910_95 compatibility. - compressor is intentionally not changed in this PR and should be handled by the operator owners separately. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: maoxx241 <maomaoyu870@gmail.com>10 天前
[1/2/N] Enable pymarkdown and python __init__ for lint system (#2011) ### What this PR does / why we need it? 1. Enable pymarkdown check 2. Enable python __init__.py check for vllm and vllm-ascend 3. Make clean code ### How was this patch tested? - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/29c6fbe58cfa705c26ed1b38f262d5ade0b4f9ba --------- Signed-off-by: wangli <wangli858794774@gmail.com>9 个月前
[Doc] Update doc url link (#5781) Drop dev suffix for doc url. Rename url to https://docs.vllm.ai/projects/ascend - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/2f4e6548efec402b913ffddc8726230d9311948d Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>4 个月前
[Core] Init vllm-ascend (#3) ### What this PR does / why we need it? vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on the Ascend NPU. This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [RFC]: Hardware pluggable, providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM. This patch also include changes to make CI work and use cache speed up e2e test, including: 1. Change push (post merge ci) and pull_request (pr ci) trigger branch to main 2. Make mypy work by ignore base_communicator and clear unused deps 3. Several improvements for vllm_ascend_test: - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins) - switch git clone command to action/checkout to speedup checkout and - Enable sv for pytest for better info dump - Remove network host to resole `docker: conflicting ontions: cannot attach both user-defined and non-user-definednetwork-modes`, which is a problem on docker 1.45 but not on 1.39. 4. Adapt MLA decode optimizations: https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a ### Does this PR introduce _any_ user-facing change? Yes, init the PR. ### How was this patch tested? - This is the first PR to make ascend NPU work on vLLM. All code is tested on ascend with vLLM V0 Engine. - CI passed --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: wangli <wangli858794774@gmail.com>1 年前
[CI] Solve the problems of slow download speed and UV (#9304) ### What this PR does / why we need it? 1. Replace triton-ascend source. 2. Add uv. ### Does this PR introduce _any_ user-facing change? Speed ​​up PR execution. ### How was this patch tested? Check the installation time of vllm-ascend in the pr. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b310 天前
[CI] Solve the problems of slow download speed and UV (#9304) ### What this PR does / why we need it? 1. Replace triton-ascend source. 2. Add uv. ### Does this PR introduce _any_ user-facing change? Speed ​​up PR execution. ### How was this patch tested? Check the installation time of vllm-ascend in the pr. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b310 天前
[CI] Solve the problems of slow download speed and UV (#9304) ### What this PR does / why we need it? 1. Replace triton-ascend source. 2. Add uv. ### Does this PR introduce _any_ user-facing change? Speed ​​up PR execution. ### How was this patch tested? Check the installation time of vllm-ascend in the pr. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b310 天前
[CI] Solve the problems of slow download speed and UV (#9304) ### What this PR does / why we need it? 1. Replace triton-ascend source. 2. Add uv. ### Does this PR introduce _any_ user-facing change? Speed ​​up PR execution. ### How was this patch tested? Check the installation time of vllm-ascend in the pr. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b310 天前
[CI] Solve the problems of slow download speed and UV (#9304) ### What this PR does / why we need it? 1. Replace triton-ascend source. 2. Add uv. ### Does this PR introduce _any_ user-facing change? Speed ​​up PR execution. ### How was this patch tested? Check the installation time of vllm-ascend in the pr. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b310 天前
[CI] Solve the problems of slow download speed and UV (#9304) ### What this PR does / why we need it? 1. Replace triton-ascend source. 2. Add uv. ### Does this PR introduce _any_ user-facing change? Speed ​​up PR execution. ### How was this patch tested? Check the installation time of vllm-ascend in the pr. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b310 天前
Initial commit1 年前
[Doc] Fix CANN 9.0.0 release-notes URL in README.md / README.zh.md (#9298) ### What this PR does / why we need it? README.md (L57) and README.zh.md (L51) state CANN == 9.0.0 but link to the CANN 8.3 RC2 release-notes page (canncommercial/83RC2/...). In practice the vendor site silently 302-redirects that URL to …/canncommercial/900/index/index.html, which renders as a blank page — so users following the README link land on no content at all. docs/source/installation.md already uses the correct path (canncommercial/900/releasenote/...), which returns HTTP 200 and shows the real CANN 9.0.0 release notes. This PR aligns the two README files with installation.md so the link actually works. Fixes #9296 ### Does this PR introduce _any_ user-facing change? Documentation link only. ### How was this patch tested? - Manual visual diff. - Old URL (…/canncommercial/83RC2/releasenote/releasenote_0000.html) reproduced: it redirects to …/canncommercial/900/index/index.html which renders blank. - New URL (…/canncommercial/900/releasenote/releasenote_0000.html) returns HTTP 200 and shows the "CANN 9.0.0版本说明" page. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: MC_cubes <mccube2000@outlook.com>10 天前
[Doc] Fix CANN 9.0.0 release-notes URL in README.md / README.zh.md (#9298) ### What this PR does / why we need it? README.md (L57) and README.zh.md (L51) state CANN == 9.0.0 but link to the CANN 8.3 RC2 release-notes page (canncommercial/83RC2/...). In practice the vendor site silently 302-redirects that URL to …/canncommercial/900/index/index.html, which renders as a blank page — so users following the README link land on no content at all. docs/source/installation.md already uses the correct path (canncommercial/900/releasenote/...), which returns HTTP 200 and shows the real CANN 9.0.0 release notes. This PR aligns the two README files with installation.md so the link actually works. Fixes #9296 ### Does this PR introduce _any_ user-facing change? Documentation link only. ### How was this patch tested? - Manual visual diff. - Old URL (…/canncommercial/83RC2/releasenote/releasenote_0000.html) reproduced: it redirects to …/canncommercial/900/index/index.html which renders blank. - New URL (…/canncommercial/900/releasenote/releasenote_0000.html) returns HTTP 200 and shows the "CANN 9.0.0版本说明" page. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: MC_cubes <mccube2000@outlook.com>10 天前
ut: add ci guard for ut coverage (#2317) ### What this PR does / why we need it? add ci guard for ut coverage, if ut coverage of patch pr is below 80%, the ci will failed/ ### Does this PR introduce _any_ user-facing change? not involved ### How was this patch tested? not involved - vLLM version: v0.10.0 - vLLM main: https://github.com/vllm-project/vllm/commit/458e74eb907f96069e6d8a4f3c9f457001fef2ea --------- Signed-off-by: Ronald1995 <ronaldautomobile@163.com>9 个月前
[Lint]Add lint hooks for clang-format, shellcheck, forbidden imports, and boolean context manager checks (#7511) ### What this PR does / why we need it? This PR introduces several upstream vllm-aligned lint hooks into vllm-ascend and makes them part of the actual pre-commit flow. Main changes in this PR: - add check-boolean-context-manager to catch boolean expressions in with statements - add check-forbidden-imports to forbid direct re imports and disallowed direct triton imports - enable shell script linting through tools/shellcheck.sh - add root .clang-format aligned with upstream vllm, enable clang-format in pre-commit, temporarily **exclude all csrc/**** from clang-format to avoid bringing a large native code reformat into this PR This PR focuses on landing the smaller and immediately useful lint alignment first, without mixing in the larger requirements-management migration. ### Does this PR introduce _any_ user-facing change? No. This PR only updates repository lint configuration, static checks, and internal import/style enforcement. It does not change runtime behavior or public interfaces. ### How was this patch tested? Tested locally in the project virtual environment. Commands used: ```bash bash format.sh ``` Verified checks passed: ``` bash ruff check...............................................................Passed ruff format..............................................................Passed codespell................................................................Passed typos....................................................................Passed clang-format.............................................................Passed Lint GitHub Actions workflow files.......................................Passed Lint shell scripts.......................................................Passed Lint PNG exports from excalidraw.........................................Passed Check for spaces in all filenames........................................Passed Enforce __init__.py in Python packages...................................Passed Check for forbidden imports..............................................Passed Check for boolean ops in with-statements.................................Passed Suggestion...............................................................Passed - hook id: suggestion - duration: 0s To bypass pre-commit hooks, add --no-verify to git commit. ``` **note:** clang-format is enabled but currently excludes all csrc/** - vLLM version: v0.17.0 - vLLM main: https://github.com/vllm-project/vllm/commit/8b6325758cce5f9c36d38f2462edbd368b97a07c --------- Signed-off-by: MrZ20 <2609716663@qq.com>2 个月前
[Doc] Update doc url link (#5781) Drop dev suffix for doc url. Rename url to https://docs.vllm.ai/projects/ascend - vLLM version: v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/2f4e6548efec402b913ffddc8726230d9311948d Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>4 个月前
[Misc][Main2Main] Upgrade vLLM to v0.20.1 and 0506 (#8983) ### What this PR does / why we need it? Fixes NPUInputBatch not have thinking_budget_state_holder = None, caused by [[Reasoning][Feature] Support for speculative decoding with thinking budget](https://github.com/vllm-project/vllm/pull/34668) Fixes 'AscendMultiHeadLatentAttention' not have skip_topk, caused by [[Feature]: IndexCache support for DSA models](https://github.com/vllm-project/vllm/pull/37735) Fixes MLA prefill backends selection, caused by [[Attention] Abstract the MLA prefill backends and eliminate cuDNN](https://github.com/vllm-project/vllm/pull/32623) Fixes ModelRunner V2 eagle refactor, caused by [[Model Runner V2] Skip attention metadata rebuild before draft prefill ](https://github.com/vllm-project/vllm/pull/40410), [[Model Runner V2] Rebuild attn metadata between draft decode steps ](https://github.com/vllm-project/vllm/pull/41162), [[Model Runner V2] Add logprob_token_ids support](https://github.com/vllm-project/vllm/pull/40559), [[Model Runner V2] Fix rejection sampling acceptance rate gap vs MRV1](https://github.com/vllm-project/vllm/pull/40651) ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.19.1 - vLLM main: https://github.com/vllm-project/vllm/commit/4d51588e2381018348f1022dfa3a7698899805b7 --------- Signed-off-by: wxsIcey <1790571317@qq.com>19 天前
[CI/UT][PD Disaggreate] Initialize PD Disaggreate UT (#889) Initialize PD Disaggreate UT --------- Signed-off-by: MengqingCao <cmq0113@163.com>11 个月前
[Ops][Feature] Add vLLM Ascend Support for Qwen2.5-Math-RM-72B (#7886) ### What this PR does / why we need it? This PR adds vLLM Ascend platform support for the Qwen2.5-Math-RM-72B reward model. It includes test configurations and deployment documentation. Fixes #6700 --------- Signed-off-by: yuhongming-2026 <hongming@isrc.iscas.ac.cn>10 天前
[Misc] requirements: fix pyporject typo and drop duplicated arctic-inference pin in requirements-dev.txt (#9320) ### What this PR does / why we need it? Two trivial cleanups (see #9318). No runtime change. ```diff --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,3 @@ --extra-index-url https://triton-ascend.osinfra.cn/pypi/simple -# Should be mirrored in pyporject.toml +# Should be mirrored in pyproject.toml cmake>=3.26 ``` ```diff --- a/requirements-dev.txt +++ b/requirements-dev.txt -arctic-inference==0.1.1 ``` arctic-inference==0.1.1 is still installed via the transitive `-r requirements.txt`. Fixes #9318. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - grep -n arctic-inference requirements.txt requirements-dev.txt → 1 match (only in requirements.txt). - pip install --dry-run -r requirements-dev.txt still resolves arctic-inference==0.1.1. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 Signed-off-by: MC_cubes <mccube2000@outlook.com>10 天前
[Test] Remove VLLM_USE_V1 in example and tests (#1733) V1 is enabled by default, no need to set it by hand now. This PR remove the useless setting in example and tests - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/9ad0a4588ba4e9c979cda0d178dec4fcdb89fd0c Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>10 个月前
[CI] Solve the problems of slow download speed and UV (#9304) ### What this PR does / why we need it? 1. Replace triton-ascend source. 2. Add uv. ### Does this PR introduce _any_ user-facing change? Speed ​​up PR execution. ### How was this patch tested? Check the installation time of vllm-ascend in the pr. - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b310 天前
[Feature] Support DeepseekV4 (#9270) ## Overview Add DeepSeek V4 model support to vllm-ascend, introducing the full model architecture, attention backend, custom operators, KV cache management, tool-call parser, and distributed inference capabilities specific to DeepSeek V4. ## Key Components 1. Model Architecture (models/) - deepseek_v4.py — Full model implementation with MLA (Multi-head Latent Attention), MoE (Mixture of Experts) routing, OLora output projection (wo_a + wo_b), Hadamard transform, DSA attention integration, and MTP speculative decoding support - deepseek_v4_mtp.py — Multi-Token Prediction (MTP) drafter for DeepSeek V4 spec-decode - layer/attention/layer.py — Attention layer binding DSA backend to DeepSeek V4 attention spec, with SWA (Sliding Window Attention) cache and prefix-caching integration 2. Attention Backend (attention/) - dsa_v1.py — Ascend DSA (DeepSeek Sparse Attention) backend implementation with: - Prefill/decode metadata builder with compress ratio-aware KV cache scheduling - AscendDSAImpl forward path: rotary embeddings, OLora TP, sparse attention dispatch, compressor/indexer integration - Spec-decode metadata handling and graph-mode padding support - abstract.py — DSAAttentionImpl abstract interface 3. Custom Operators (ops/) - dsa.py — DSA forward operator dispatch (torch.ops.vllm.dsa_forward) - rope_dsv4.py — DeepSeek V4 rotary embedding (ComplexExpRotaryEmbedding) with YARN scaling and multi-group rope support - mhc.py — Multi-head compression operator - triton/mul_add.py, triton/rms_norm.py — Triton-accelerated fused mul-add and RMS norm kernels 4. KV Cache Management (core/) - single_type_kv_cache_manager.py — CompressAttentionManager with compress-ratio-aware block allocation, cache hit detection, and eviction logic for compressed KV tensors 5. MoE Enhancements (ops/fused_moe/) - Updated expert selector, fused MoE forward, comm methods, and prepare/finalize for DeepSeek V4's routing topology (logical/physical expert mapping, expert parallelism) 6. Distributed KV Transfer (distributed/kv_transfer/) - mooncake_hybrid_connector.py — Mooncake hybrid KV cache connector for cross-node KV transfer, supporting HMA (Heterogeneous Memory Access), multi-group KV cache spec, and P2P transfer with retry logic 7. Tool-Call Parser (patch/platform/) - patch_deepseek_v4_tool_call_parser.py — Streaming tool-call parser for DeepSeek V4's DSML format (<|DSML|tool_calls>/<|DSML|invoke>/<|DSML|parameter>), with type coercion, param repair, and partial buffer handling - Tests included: test_patch_deepseek_v4_tool_call_parser.py 8. Patches (patch/) - patch_core.py — Multi-group KV cache initialization for V4's compressed cache groups - patch_kv_cache_coordinator.py — AscendHybridKVCacheCoordinator with compress ratio-aware block scheduling, PCP (Pipeline Context Parallelism) support - patch_kv_cache_interface.py — Extended KV cache spec for compressed attention (compress_ratio, alignment) - patch_kv_cache_utils.py — Compressed block hash logic and scheduler KV cache config - patch_speculative_config.py — Spec-decode configuration for DSA models - patch_deepseek_compressor.py — Compressor state cache and indexer cache registration 9. Worker & Runner Changes (worker/) - model_runner_v1.py — Compressed position preprocessing, multi-group metadata building, DSA attention group wiring, PCP long-sequence handling, graph-padding support - block_table.py — Compressed-KV-aware block table management 10. Quantization Support (quantization/) - modelslim_config.py — ModelSlim quantization descriptor handling for V4 weight mappings (shared_head, weight_packed) - Updated W4A8, W4A16, W8A8, W8A8 MXFP8 quantization methods for V4 compatibility 11. Testing - test_dsv4_compressed_positions.py — Unit tests for compressed position precomputation - test_patch_deepseek_v4_tool_call_parser.py — Streaming tool-call parser tests (chunked arguments, metadata, wrapper params) 12. Entry Point Registration - setup.py — Registered ascend_0day_model plugin entry point for V4 model auto-discovery - models/__init__.py — Register "DeepseekV4ForCausalLM" and "DeepseekV4MTPForCausalLM" in vLLM's model registry ## Co-authors Co-authored-by: liuyan190974 <shandaliuyan@163.com> Co-authored-by: 1132509010 <1132509010@qq.com> Co-authored-by: ader47 <1661888967@qq.com> Co-authored-by: anakin-wx <1084704046@qq.com> Co-authored-by: anon189Ty <Stari_Falcon@outlook.com> Co-authored-by: ChangminTao <taocm123@qq.com> Co-authored-by: chenchris2 <1349418798@qq.com> Co-authored-by: ChenxiQ <chenxi.qian.cq@outlook.com> Co-authored-by: coder-fny <985619145@qq.com> Co-authored-by: Angazenn <supperccell@163.com> Co-authored-by: fuzhihong699 <fuzhihong4@huawei.com> Co-authored-by: goldVitaminC <297780618@qq.com> Co-authored-by: HiC4Sh1e <chenjie137@huawei.com> Co-authored-by: hwhaokun <haokun0405@163.com> Co-authored-by: kirliavc <jlc@pku.edu.cn> Co-authored-by: lcfenglinwan <lcfenglin@qq.com> Co-authored-by: Liexss <924834690@qq.com> Co-authored-by: maoxx241 <maomaoyu870@gmail.com> Co-authored-by: linfeng-yuan <1102311262@qq.com> Co-authored-by: liuyan190974 <shandaliuyan@163.com> Co-authored-by: LookAround0301 <lixushi@huawei.com> Co-authored-by: maoxx241 <maomaoyu870@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: monologue815 <monologue815@qq.com> Co-authored-by: MosCloud <bwzhang1991@163.com> Co-authored-by: nomewang <nomeyue@outlook.com> Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com> Co-authored-by: pinfa <1819563383@qq.com> Co-authored-by: pjgao <1783198484@qq.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> Co-authored-by: realliujiaxu <realliujiaxu@163.com> Co-authored-by: showMeYourCode1997 <934005226@qq.com> Co-authored-by: SidaoY <1024863041@qq.com> Co-authored-by: slippersss <slippersss@126.com> Co-authored-by: Toneymiller <1476209578@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com> Co-authored-by: weinachuan <1173732899@qq.com> Co-authored-by: WithHades <244036962@qq.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Co-authored-by: WOE-Y <876362620@qq.com> Co-authored-by: wxh571001500 <571001500@qq.com> Co-authored-by: wxsIcey <1790571317@qq.com> Co-authored-by: xmpp777 <yangming2@huawei.com> Co-authored-by: yiz-liu <liu_yizhou@outlook.com> Co-authored-by: yzylxyypl <yzylxyypl@gmail.com> Co-authored-by: zcc-zjut <zcczxy2019@163.com> Co-authored-by: zhangsicheng5 <zhangsicheng5@huawei.com> Co-authored-by: zhaozx-cn <zhaozx2116@163.com> Co-authored-by: zhenwenqi_2024 <zhenwenqi_2022@qq.com> Co-authored-by: ZT-AIA <1028681969@qq.com> - vLLM version: v0.20.2 - vLLM main: https://github.com/vllm-project/vllm/commit/0d4d334eaa583b9c09aa4eb7538c22db99fd84b3 --------- Signed-off-by: GDzhu01 <809721801@qq.com> Signed-off-by: QiuChunshuo <chunshuoq@gmail.com> Co-authored-by: QiuChunshuo <chunshuoq@gmail.com>9 天前
[Feature][Doc] Add AI QoS module, tuning tool, and user guide (#8706) ### What this PR does / why we need it? This PR adds **AI QoS** support for operator-facing tuning on Ascend: a **Python tool** to apply/undo and print UB switch–style configuration, **unit tests**, and an **English** user guide with platform and software constraints. - **csrc/ai_qos**: Exposes set_qos / get_qos, set_bw / get_bw, and fuse/global config helpers via **pybind11**; integrated into the build (**CMake** / **setup.py** as applicable in this tree). - **tools/ai_qos.py**: apply to snapshot baseline and program QoS state; unset to restore and remove state; supports auto/manual traffic priorities and prints command for UB switch configuration. - **tests/ut/test_ai_qos_tool.py**: Mocks torch.npu and vllm_ascend.ai_qos; covers device list, first-apply baseline reuse, and unset/restore. - **Docs** (`docs/source/user_guide/feature_guide/AI QoS Introduction_en.md`): Background, Auto/Manual usage, how to disable; **Usage constraints** including: - **AIV H2D / AIV D2D** host QoS: not effective with the current driver stack; delivery planned via module upgrade after driver support lands. - **Software**: **Ascend HDK 26.0.0+**, **LingQu**-based **UB switch** version as listed in the doc table. ### Does this PR introduce _any_ user-facing change? **Yes.** Operators get a new optional pre-inference step (`python tools/ai_qos.py / unset`) and a published English guide with version and **constraint** information. ### How was this patch tested? - pytest -sv tests/ut/test_ai_qos_tool.py (or full `pytest -sv tests/ut` as required by the project) - vLLM version: v0.19.0 - vLLM main: https://github.com/vllm-project/vllm/commit/6f786f2c506cb07f4566771fdc62e640e2c4a176 --------- Signed-off-by: gtl <gaotianlong6@h-partners.com> Co-authored-by: gtl <gaotianlong6@h-partners.com>22 天前

vllm-ascend

vLLM Ascend Plugin

| 关于昇腾 | 官方文档 | #sig-ascend | 用户论坛 | 社区例会 |

English | 中文


最新消息 🔥

  • [2026/05] 我们发布了新的正式版本 v0.18.0! 请按照官方指南开始在Ascend上部署vLLM Ascend Plugin。
  • [2026/02] 我们发布了新的正式版本 v0.13.0! 请按照官方指南开始在Ascend上部署vLLM Ascend Plugin。
  • [2025/12] 我们发布了新的正式版本 v0.11.0! 请按照官方指南开始在Ascend上部署vLLM Ascend Plugin。
  • [2025/09] 我们发布了新的正式版本 v0.9.1! 请按照官方指南开始在Ascend上部署大型专家并行 (EP)。
  • [2025/08] 我们与vLLM和腾讯合作举办了vLLM北京Meetup,!请在这里找到演讲材料。
  • [2025/06] 用户案例现已上线!展示了LLaMA-Factory/verl/TRL/GPUStack等用户案例,展示了vLLM Ascend如何帮助昇腾用户在模型微调、评估、强化学习 (RL) 以及部署等场景中提升体验。
  • [2025/06] 贡献者页面现已上线!所有的贡献都值得被记录,感谢所有的贡献者。
  • [2025/05] 我们发布了首个正式版本 v0.7.3!我们与 vLLM 社区合作发布了一篇博客文章,分享了我们的实践:Introducing vLLM Hardware Plugin, Best Practice from Ascend NPU
  • [2025/03] 我们和vLLM团队举办了vLLM Beijing Meetup! 你可以在这里找到演讲材料.
  • [2025/02] vLLM社区正式创建了vllm-project/vllm-ascend仓库,让vLLM可以无缝运行在Ascend NPU。
  • [2024/12] 我们正在与 vLLM 社区合作,以支持 [RFC]: Hardware pluggable.

总览

vLLM 昇腾插件 (vllm-ascend) 是一个由社区维护的让vLLM在Ascend NPU无缝运行的后端插件。

此插件是 vLLM 社区中支持昇腾后端的推荐方式。它遵循[RFC]: Hardware pluggable所述原则:通过解耦的方式提供了vLLM对Ascend NPU的支持。

使用 vLLM 昇腾插件,可以让类Transformer、混合专家(MOE)、嵌入、多模态等流行的大语言模型在 Ascend NPU 上无缝运行。

准备

  • 硬件:Atlas 800I A2 Inference系列、Atlas A2 Training系列、Atlas 800I A3 Inference系列、Atlas A3 Training系列、Atlas 300I Duo(实验性支持)
  • 操作系统:Linux
  • 软件:
    • Python >= 3.10, < 3.12
    • CANN == 9.0.0 (Ascend HDK 版本参考这里)
    • PyTorch == 2.10.0, torch-npu == 2.10.0
    • vLLM (与vllm-ascend版本一致)

开始使用

推荐您使用以下版本快速开始使用:

Version Release type Doc
v0.19.1rc1 最新RC版本 请查看快速开始安装指南了解更多
v0.18.0 最新正式/稳定版本 快速开始 and 安装指南了解更多

贡献

请参考CONTRIBUTING文档了解更多关于开发环境搭建、功能测试以及 PR 提交规范的信息。

我们欢迎并重视任何形式的贡献与合作:

  • 请通过Issue来告知我们您遇到的任何Bug。
  • 请通过用户论坛来交流使用问题和寻求帮助。

分支策略

vllm-ascend有主干分支和开发分支。

  • main: 主干分支,与vLLM的主干分支对应,并通过昇腾CI持续进行质量看护。
  • releases/vX.Y.Z: 开发分支,随vLLM部分新版本发布而创建,比如releases/v0.13.0是vllm-ascend针对vLLM v0.13.0 版本的开发分支。

下面是维护中的分支:

分支 状态 备注
main Maintained 基于vLLM main分支和vLLM最新版本(v0.18.0)CI看护
v0.7.1-dev Unmaintained 不再维护
v0.7.3-dev Unmaintained 只允许Bug修复,不会再发布新版本
v0.9.1-dev Unmaintained 只允许Bug修复,不会再发布新版本
v0.11.0-dev Unmaintained 只允许Bug修复,不会再发布新版本
releases/v0.13.0 Maintained 基于vLLM v0.13.0版本CI看护
releases/v0.18.0 Maintained 基于vLLM v0.18.0版本CI看护
rfc/feature-name Maintained 为协作创建的特性分支

请参阅版本策略了解更多详细信息。

社区例会

许可证

Apache 许可证 2.0,如 LICENSE 文件中所示。

项目介绍

可在Ascend NPU上无缝运行类Transformer、MOE、嵌入、多模态等大语言模型,提升模型微调、评估、强化学习及部署体验。是vLLM社区推荐的昇腾后端支持方式,遵循硬件可插拔原则。【此简介由AI生成】

定制我的领域

下载使用量

0

项目总下载次数(含Clone、Pull、 zip 包及 release 下载),每日凌晨更新

语言类型

C++56.25%
Python37.34%
CMake2.87%
Shell2.69%
C0.74%