transformers:🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

分支1256Tags272
文件最后提交记录最后更新时间
Fix link to modular transformers documentation (#45746) Updated link to the modular transformers documentation for clarity.22 天前
FSDP + TP & native save/load distributed (#45028) * init * FSDP2 (fully_shard) integration - Add apply_fully_shard_data_parallel() with auto/manual mode block detection - FSDP vs DDP loss/grad parity tests - Distributed test helpers (testing_utils.py) - is_fsdp_enabled(), is_fsdp_managed_module() utilities - Minimal FSDP hooks in from_pretrained - FSDP-aware flash attention check * DistributedConfig + shard-on-read loading - DtensorShardOperation for range-math shard-on-read - spawn_materialize() enhancements - from_pretrained wiring for distributed config - Shard operation helpers in tensor_parallel - Shard-on-read and LoadStateDictConfig tests * TPStyle API + dense model tensor parallelism - Replace hook-based TP with DTensor-based TPStyle API - TPStyle dataclass with dense kinds: colwise, rowwise, vocab - apply_tensor_parallel() using PyTorch parallelize_module - verify_tp_plan() for plan validation - Update dense model configs (llama, mistral, qwen2, phi, glm) to TPStyle - DTensor apply_rotary_pos_emb guard for llama, mistral, qwen3 - Extended DistributedConfig with tp/fsdp size and plan fields - DistributedConfig serialization in configuration_utils - MXFP4 NotImplementedError for DTensor TP - Dense TP tests * revert some files * Add distributed training scripts - train_fsdp_tp.py: minimal FSDP+TP training example - train_fsdp_tp_torchtitan_style.py: torchtitan-style training example - verify_loading.py: save/load roundtrip verification - run_compare.sh: FSDP+TP vs FSDP-only comparison - run_verify_all.sh: run verification across all modes - tmp_generate.py: quick generation test * Remove train_fsdp_tp_torchtitan_style.py * unify the utils for fsdp * Fix CI: re-export moved FSDP utils + remove stale type: ignore - Re-export is_fsdp_enabled and is_fsdp_managed_module from integrations/fsdp.py (moved to distributed/utils.py) - Remove unused # type: ignore comments in generation/utils.py * Fix ruff formatting in core_model_loading.py * Fix ruff linting and formatting * Backport new TP/FSDP API from orchestration-save-load branch * Fix DTensor imports in Copied-from model files * MoE expert parallelism + sequence parallelism (#45408) * MoE expert parallelism + sequence parallelism - Add PackedColwiseParallel for fused gate_up_proj weights - Add MoEExpertsParallel with per-expert DTensor sharding - Add PrepareModuleInputOutput for SP allgather/split hooks - Add _AllReduceBackward for MoE routing weight gradients - Extend TPStyle with moe_experts, packed_colwise, activation, module kinds - _StridedShard handling in core_model_loading for interleaved weights - MoE model configs: mixtral, deepseek_v3, qwen3 with SP plans - DTensor rotary_pos_emb guard for mixtral * Fix ruff linting and formatting * Fix ruff formatting in core_model_loading.py * Restore _IdentityOp accidentally removed in 25a1f4808e The _IdentityOp class (added by PR #44983) was accidentally deleted during the MoE expert parallelism work. It is needed by finegrained_fp8.py and metal_quantization.py as a pass-through reverse_op for dequantize operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Backport new TP/FSDP API + fix DTensor imports in Copied-from models * from_pretrained orchestration + distributed save/load (#45409) * from_pretrained orchestration + save/load - Add gather_full_state_dict() for DTensor→full tensor saving - Add convert_strided_to_shard() / restore_strided_from_shard() for DCP - Add _redistribute_dtensor() helper - Full distributed_config integration in from_pretrained/save_pretrained - Rename apply_fsdp2 → apply_fully_shard_data_parallel - save_optimizer() / load_optimizer() in distributed/utils - Trainer integration with distributed_config - Updated FSDP and TP tests for new orchestration API - DTensor shard-on-read test updates * revert distributed utils * eaaea * all tests for core modeling are passing * populate import from init for tp * ruff * ruff --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * do monkey patching for rotary * Revert modeling file diffs to match fsdp-core-model-loading base Restores modeling files to their base branch versions so the PR diff only shows the distributed/patches.py monkey-patch approach instead of noisy function moves in modeling files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Migrate all model TP plans from strings to TPStyle - Convert string plan values ("colwise", "rowwise", etc.) to TPStyle objects across 66+ model configs and modular files - Consolidate MoE expert sub-entries into TPStyle("moe_experts", ...) with shard_plan - Remove "replicated_with_grad_allreduce" entries (not needed for DTensor TP) - Migrate _tp_plan class attributes in modeling files from "colwise_gather_output" to TPStyle("colwise", "allgather") - Add TypeError in apply_tensor_parallel for unsupported plan values - Remove old TensorParallelLayer tests (API removed in DTensor refactor) - Regenerate auto-generated files via modular converter * Restore mxfp4.py to match base branch * Drop mla_kv_a_proj and moe_identity_expert from TP plans These string plan values have no TPStyle equivalent in the DTensor system. Remove them to avoid TypeError at apply_tensor_parallel time. Affected models: deepseek_v2, glm4_moe_lite, glm_moe_dsa, longcat_flash. * more comments * fix tp for most models. PyTorch doesn't implement all placement conversions (e.g. _StridedShard↔Shard). We force replicate beforehand * fix tp through _replicate_dtensor * revert small change * push temporary fix for TP and strided shard for backward * refactor a bit * patches for rotary * refactor MoEExpertsParallel * fix tp for last models * refactor moe expert parallels * linting * add sp plan for models * add deepseek v2 sp plan * undo sp plan for some tricky models * remove lm_head from config * first pass of refactoring dtensor shard operator * better refacto * batter explanation of DtensorShardOperation * refactor dtensor test to reflect real world scenario * more comments * fix tp olmo hybrid and exaone * Enhance tensor parallel weight tying logic to prevent clobbering of lm_head when embed_tokens is not in the plan. * fix fsdp mixin test due to missing args * fix test non model * skip sp plan for exaone and olmo hybrid * linting * fix import for ci * test distributed config * attempt to fix guarding import ci * fix ci check repro * add ALL_PARALLEL_STYLES registry alongside TPStyle * route apply_tensor_parallel through ALL_PARALLEL_STYLES * migrate modular files to string-based TP plans * migrate standalone configs and modelings to string-based TP plans * delete TPStyle dataclass * fix use_local_output defaults for SequenceParallel and PrepareModuleInput in registry * use parallel style from torch * revert changes in weight converter * remove dead code in set_param_for_module * remove dead code * cleaning again * cleaning * revert change * linting * refactor dtensor shard ops * revert some stuff in core model loading * core model loading clean * guarding import * better separation tensor parall and generic utils * isolate DtensorShardOperation into a separate file * no need to patch rotary * better seperation * simplify gather_full_state_dict * simplify _replicate_dtensor * fix and clean _replicate_dtensor * better doc for DtensorShardOperation * fix saving optimizer with DCP for fused weights * save_pretrained(distributed_checkpoint=true) * linting * refactor into a single function _dtensor_from_local_like * zeros_like instead of empty_like * move tp and fsdp under distributed * distribute_model * fix deadlock when saving * clip grad norm function * maybe_disable_foreach_and_fused_for_mixed_dtensor_groups * better TP api for ease of understanding * remove shard_param to make it easier * fix import in test * _swap_dtensor_params_for_local * fix qwen3 nanochat dots1 * add tpu * move TP refactor experimentation scripts to backup branch Move ad-hoc training / verification / compare scripts off this branch into refactor-tp-dtensor-scripts so the diff stays focused on library changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * linting * register distributed sharding_utils and utils in __init__ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * rename TP plan styles to match new ALL_PARALLEL_STYLES registry Replace pre-refactor names that no longer exist in src/transformers/distributed/tensor_parallel.py: rowwise -> rowwise_allreduce moe_tp_experts -> moe_experts_allreduce replicated_with_grad_allreduce -> activation_seq_dim_2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * enable EP * Add enable_expert_parallel configuration option in test_distributed_config * no more auto mode * edit fsdp plan to every other models * update fsdp mixin tests * linting * fix test fsdp * fsdp linting * revert gitignore * _apply within for loop * rename * doc sp plan * fix * unified settattr + torch no grad + _local_tensor * revert * linting * fix ruff * make check-repository-consistency * trigger fsdp mixin test in CI * fix fsdp ci * Reset tests/test_modeling_common.py to main Restores legitimate improvements that were accidentally undone during a stale merge of main into fsdp-vs-ddp: - Restore test_resize_embeddings_untied_no_reinit_on_post_init - Restore clipseg / Timm / evolla / parakeet_* / pi0 / musicflamingo special-cases - Restore skip_base_model parameter on test_reverse_loading_mapping - Restore "is not None" guard on subconfig in test_initialization - Fix typo: "ot" -> "or" in test_reverse_loading_mapping assert message --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>8 天前
Enable push event (to main) for PR CI workflow (#46240) enable Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>3 小时前
Rework dependencies and extras + Remove outdated `templates` folder (#43536) * start * more * more * fix * up * still improve * still improve3 个月前
[CB] [Major] Add tensor paralellism (#45821) * TP heads and DP / TP seeds * Reproducible hash * Add the notion of TP drivers * Fix NCCL device * Temporary fix for multiple streams * Better handling of NCCL graph mixing * Fix cfg * nit * Move the seed setting * Reworked overall to have accuracy scoring * Adding tests 1/n * Added tests * Style * Fixes * CC review * Nits * Renames * Small fixes * Move distributed stuff to a distributed file * Docstring * Final fixes * Review compliance * Review compliance 2 * Rebase fix * Style * Less redudant testing suite * Fix TP plan * Fix stopping condition * Nits9 天前
[CI] AMD docker: bump to ROCm 7.2.2 / PyTorch 2.10 + prebuilt FA wheel (#45913) * new image & FA whls * update some expectations * fix more expectations * fix style15 天前
[GLM-4.6V] Update with GLM-GA Processor (#46184) * update with GLMGA * Update glmga.md * remove model impl * reslove config * Update check_config_attributes.py * update default value * Update modular_glmga.py * update frames index1 天前
[CB] Remove OpenTelemetry (#45984) Remove opentelemetry8 天前
docs: sync legacy ACL anthology URLs and update metrics across i18n READMEs (#46027) * docs: update Hindi README translation text and sync ACL URL * docs: update Urdu README translation text and sync ACL URL * docs: sync ACL Anthology URL in Arabic README * docs: sync ACL Anthology URL in Bengali README * docs: sync ACL Anthology URL in German README * docs: sync ACL Anthology URL in Spanish README * docs: sync ACL Anthology URL in Persian README * docs: sync ACL Anthology URL in French README * docs: sync ACL Anthology URL in Italian README * docs: sync ACL Anthology URL in Japanese README * docs: sync ACL Anthology URL in Korean README * docs: sync ACL Anthology URL in Portuguese README * docs: sync ACL Anthology URL in Russian README * docs: sync ACL Anthology URL in Telugu README * docs: sync ACL Anthology URL in Vietnamese README * docs: sync ACL Anthology URL in Simplified Chinese README * docs: sync ACL Anthology URL in Traditional Chinese README8 天前
Add 6 huggingface notebooks on AMD dev cloud (#41883) * Add 6 huggingface notebooks on AMD dev cloud * Change all AMD huggingface notebook links to https protocol. --------- Co-authored-by: pagezyhf <165770107+pagezyhf@users.noreply.github.com>6 个月前
Added S110 - try-except-pass rule (#43687) Added Bandit's S110 to makes sure we don't have dry Except blocks, and cleaned up all occurances. Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>3 个月前
Fix model parallel bugs for Gemma4 (#45817) * fix model parallel issue for gemma4 Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * skip FA2 equivalance test cases Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update output Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update code Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * Update src/transformers/models/gemma4/modeling_gemma4.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update code Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * fix LINT issue Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * Apply repo consistency fixes --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>5 小时前
fix(hrm_text): Add XPU Expectations for tests (#46214) * fix(hrm_text): auto-fix failing tests Fixed 3 test(s): - tests/models/hrm_text/test_modeling_hrm_text.py::HrmTextModelTest::test_flash_attn_2_fp32_ln - tests/models/hrm_text/test_modeling_hrm_text.py::HrmTextIntegrationTest::test_forward_logits - tests/models/hrm_text/test_modeling_hrm_text.py::HrmTextIntegrationTest::test_greedy_generation * update Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update value for BMG Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update Signed-off-by: kaixuan1992 <kaixuan.liu92@gmail.com> --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> Signed-off-by: kaixuan1992 <kaixuan.liu92@gmail.com> Co-authored-by: kaixuan1992 <kaixuan.liu92@gmail.com>3 小时前
[GLM-4.6V] Update with GLM-GA Processor (#46184) * update with GLMGA * Update glmga.md * remove model impl * reslove config * Update check_config_attributes.py * update default value * Update modular_glmga.py * update frames index1 天前
create .git-blame-ignore-revs file (#43982) create blame ignore3 个月前
Add trajectory transformer (#17141) * Add trajectory transformer Fix model init Fix end of lines for .mdx files Add trajectory transformer model to toctree Add forward input docs Fix docs, remove prints, simplify prediction test Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update docs, more descriptive comments Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update readme Small comment update and add conversion script Rebase and reformat Fix copies Fix rebase, remove duplicates Fix rebase, remove duplicates * Remove tapex * Remove tapex * Remove tapex3 年前
chore(qa): split out mlinter (#45475) * split out mlinter * bump * fmt * mot needed * remove a couple of mlinter refs in tests * move out mlinter internals (now it has a public API) * always re-run this checker * pin mlinter * moved to transformers-mlinter1 个月前
Centralize AI agent templates in `.ai` (#44489) * proposed template for agnostic ai agents layout * we can use both at the same time * use proper skill header for discoverability * use symlinks * tweak the skills based on our latest work in typing * second pass * use symlinks * doc tweaks * symlinks * skill tweaks * more tweaking2 个月前
Update CITATION.cff (#13833) 4 年前
Centralize AI agent templates in `.ai` (#44489) * proposed template for agnostic ai agents layout * we can use both at the same time * use proper skill header for discoverability * use symlinks * tweak the skills based on our latest work in typing * second pass * use symlinks * doc tweaks * symlinks * skill tweaks * more tweaking2 个月前
Update Code of Conduct to Contributor Covenant v2.1 (#19935) * Update Code of Conduct to Contributor Covenant v2.1 * Update CODE_OF_CONDUCT.md3 年前
Use doc-builder runnable example for GLM-ASR (#44277) * use real tests frm the doc * revert change * now in main * added runnable in the CI doctest (in parallel) * explicitely activate hf-doc-builder * added extra step for circleci * doc pointers * remove custom step * use a single source of truth for the hf-doc-builder version * added a doc extras so we simplify the CI wf * fix hf-doc-buidler * activate doc-builder in model ci * revert test-side changes * syntax revert * add run-doctest * add docs * docs * doc vs docs. why * its you * fixed matrix issue * revert workflow changes1 个月前
deprecate `overwrite_output_dir` (#41323) * dep * style * rm * wut * style7 个月前
Copyright (#8970) * Add copyright everywhere missing * Style5 年前
Add image processors refactor to v5 migration guide (#45556) * Add image processors refactor to v5 migration guide * Review + format28 天前
make sure we call check_auto in CI (#45775) * make sure we call check_auto in CI * make sure we never drift again * improved the script so it actually displays what differs * make sure we pick the natural one first * added test coverage for check_auto * oops reverted this one by accident * keep it simple * fix-repo is only 15 checks * improved fix vs non-fix check runs23 天前
docs(readme): use canonical `huggingface.co` domain in prose links (#46042) docs(readme): use canonical huggingface.co domain in prose links8 天前
DOCS: Add missing space in SECURITY.md (#40087) 9 个月前
Adds Universal Intelligence to awesome transformers documentation 6 个月前
Revert 45777 (#45942)14 天前
SINQ quantization strategy integration (adapted for Transformers V5) (#43112) * sinq integration files * sinq integration update * sinq integration no lazy import * Tests for sinq integration * minor changes to sinq integration * sinq integration documentation added * small correction to sinq documentation * small correction to sinq documentation * remove auto_patch_io flag and fix the selection of the device for sinq integration * remove auto_patch_io flag and fix the selection of the device for sinq integration * remove auto_patch_io flag and fix the selection of the device for sinq integration * remove auto_patch_io flag and fix the selection of the device for sinq integration * Code style fix sinq integration * minor changes in comments for sinq integration * add for documentation sinq integration * add documentation for sinq integration * minor adjustment in sinq quantizer * minor changes to sinq integration * delete debugging print in sinq integration * sinq integration files * sinq integration update * sinq integration no lazy import * Tests for sinq integration * minor changes to sinq integration * sinq integration documentation added * small correction to sinq documentation * small correction to sinq documentation * remove auto_patch_io flag and fix the selection of the device for sinq integration * remove auto_patch_io flag and fix the selection of the device for sinq integration * remove auto_patch_io flag and fix the selection of the device for sinq integration * remove auto_patch_io flag and fix the selection of the device for sinq integration * Code style fix sinq integration * minor changes in comments for sinq integration * add for documentation sinq integration * add documentation for sinq integration * minor adjustment in sinq quantizer * minor changes to sinq integration * delete debugging print in sinq integration * Adapt sinq integration to transformers v5 * sinq integration for transformers v5 * Added part of the suggested modifications to make the code simpler * Modification of the quantization flow and remove of asinq option * Minor adjustments and creation of fuction to substitute quantized layers * Eliminate device specification in SinqConfig and tests script adaptation * final adjustments and checks * final checks * Fix merge conflict in import_utils * Fix code quality * update tests and fixing minor typos * Fix grammar of warning message * Update tests/quantization/sinq/test_sinq.py * Apply repo consistency fixes --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>3 个月前
chore(qa): split pipeline and add type checking (#45432) * chore(qa): split pipeline and add type checking * added serving to quality * fmt1 个月前
Bump dev version (#46188) update2 天前

Hugging Face Transformers Library

Checkpoints on Hub Build GitHub Documentation GitHub release Contributor Covenant DOI

English | 简体中文 | 繁體中文 | 한국어 | Español | 日本語 | हिन्दी | Русский | Português | తెలుగు | Français | Deutsch | Italiano | Tiếng Việt | العربية | اردو | বাংলা | فارسی |

用于推理和训练的最先进预训练模型

Transformers 是一个模型定义框架,适用于文本、计算机视觉、音频、视频和多模态模型的最先进机器学习,支持推理和训练。

它集中管理模型定义,确保整个生态系统对该定义达成一致。transformers 是各框架之间的枢纽:如果支持某个模型定义,它将与大多数训练框架(Axolotl、Unsloth、DeepSpeed、FSDP、PyTorch-Lightning 等)、推理引擎(vLLM、SGLang、TGI 等)以及利用 transformers 模型定义的相关建模库(llama.cpp、mlx 等)兼容。

我们承诺通过提供简单、可定制且高效的模型定义,帮助支持新的最先进模型并普及它们的使用。

Hugging Face Hub 上有超过 100 万个 Transformers 模型检查点 可供您使用。

立即探索 Hub 以找到合适的模型,并使用 Transformers 立即开始您的项目。

安装

Transformers 适用于 Python 3.10+ 以及 PyTorch 2.4+。

使用 venvuv(一个快速的基于 Rust 的 Python 包和项目管理器)创建并激活虚拟环境。

# venv
python -m venv .my-env
source .my-env/bin/activate
# uv
uv venv .my-env
source .my-env/bin/activate

在您的虚拟环境中安装 Transformers。

# pip
pip install "transformers[torch]"

# uv
uv pip install "transformers[torch]"

如果您希望获取库中的最新变更,或者有意参与贡献,可从源代码安装Transformers。不过,最新版本可能不够稳定。如果遇到错误,欢迎提交issue

git clone https://github.com/huggingface/transformers.git
cd transformers

# pip
pip install '.[torch]'

# uv
uv pip install '.[torch]'

快速入门

借助 Pipeline API 立即开始使用 Transformers。Pipeline 是一个高级推理类,支持文本、音频、视觉和多模态任务。它负责对输入进行预处理并返回相应的输出。

实例化一个 pipeline 并指定用于文本生成的模型。模型会被下载并缓存,以便您可以轻松地再次重用它。最后,传入一些文本以提示模型。

from transformers import pipeline

pipeline = pipeline(task="text-generation", model="Qwen/Qwen2.5-1.5B")
pipeline("the secret to baking a really good cake is ")
[{'generated_text': 'the secret to baking a really good cake is 1) to use the right ingredients and 2) to follow the recipe exactly. the recipe for the cake is as follows: 1 cup of sugar, 1 cup of flour, 1 cup of milk, 1 cup of butter, 1 cup of eggs, 1 cup of chocolate chips. if you want to make 2 cakes, how much sugar do you need? To make 2 cakes, you will need 2 cups of sugar.'}]

要与模型对话,使用方式是相同的。唯一的区别在于,你需要构建一个你与系统之间的对话历史(作为 Pipeline 的输入)。

Tip

你也可以直接从命令行与模型对话,前提是 transformers serve 正在运行

transformers chat Qwen/Qwen2.5-0.5B-Instruct
import torch
from transformers import pipeline

chat = [
    {"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."},
    {"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}
]

pipeline = pipeline(task="text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", dtype=torch.bfloat16, device_map="auto")
response = pipeline(chat, max_new_tokens=512)
print(response[0]["generated_text"][-1]["content"])

展开以下示例,了解 Pipeline 在不同模态和任务中的工作方式。

语音自动识别
from transformers import pipeline

pipeline = pipeline(task="automatic-speech-recognition", model="openai/whisper-large-v3")
pipeline("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}
图像分类

from transformers import pipeline

pipeline = pipeline(task="image-classification", model="facebook/dinov2-small-imagenet1k-1-layer")
pipeline("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png")
[{'label': 'macaw', 'score': 0.997848391532898},
 {'label': 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',
  'score': 0.0016551691805943847},
 {'label': 'lorikeet', 'score': 0.00018523589824326336},
 {'label': 'African grey, African gray, Psittacus erithacus',
  'score': 7.85409429227002e-05},
 {'label': 'quail', 'score': 5.502637941390276e-05}]
视觉问答

from transformers import pipeline

pipeline = pipeline(task="visual-question-answering", model="Salesforce/blip-vqa-base")
pipeline(
    image="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg",
    question="What is in the image?",
)
[{'answer': 'statue of liberty'}]

为什么选择 Transformers?

  1. 易用的最先进模型:

    • 在自然语言理解与生成、计算机视觉、音频、视频及多模态任务上表现卓越。
    • 为研究人员、工程师和开发者降低入门门槛。
    • 用户只需学习三个核心类,抽象概念少。
    • 统一 API,可便捷使用所有预训练模型。
  2. 降低计算成本,减少碳足迹:

    • 共享已训练模型,无需从零开始训练。
    • 缩短计算时间,降低生产成本。
    • 涵盖数百种模型架构,各模态下预训练 checkpoint 超 100 万个。
  3. 为模型生命周期各阶段灵活选择框架:

    • 仅需 3 行代码即可训练最先进模型。
    • 模型可在 PyTorch/JAX/TF2.0 框架间随意切换。
    • 为训练、评估和生产环节选择最适合的框架。
  4. 轻松根据需求定制模型或示例:

    • 为每种架构提供示例,可复现原作者发表的结果。
    • 模型内部结构尽可能保持一致且开放。
    • 模型文件可独立于库使用,便于快速实验。
Hugging Face Enterprise Hub

何时不适合使用 Transformers?

  • 本库并非神经网络的模块化构建工具集。模型文件中的代码未进行额外抽象重构,目的是让研究人员无需深入额外抽象/文件即可快速迭代各模型。
  • 训练 API 针对 Transformers 提供的 PyTorch 模型优化。对于通用机器学习循环,建议使用 Accelerate 等其他库。
  • 示例脚本 仅为 示例。它们未必能直接适用于您的特定用例,可能需要调整代码才能运行。

100 个使用 Transformers 的项目

Transformers 不仅仅是一个使用预训练模型的工具包,更是一个围绕它和 Hugging Face Hub 构建的项目社区。我们希望 Transformers 能够让开发者、研究人员、学生、教授、工程师以及其他任何人都能构建他们的梦想项目。

为了庆祝 Transformers 获得 100,000 颗星,我们希望通过 awesome-transformers 页面聚焦社区,该页面列出了 100 个使用 Transformers 构建的出色项目。

如果您拥有或使用的项目您认为应该加入该列表,请提交 PR 进行添加!

示例模型

您可以直接在 Hub 模型页面 上测试我们的大多数模型。

展开下面的每个模态,查看针对各种用例的一些示例模型。

音频
计算机视觉
多模态
  • 使用 VoxtralAudio Flamingo 进行音频或文本到文本转换
  • 使用 LayoutLMv3 进行文档问答
  • 使用 Qwen-VL 进行图像或文本到文本转换
  • 使用 BLIP-2 进行图像 captioning
  • 使用 GOT-OCR2 进行基于 OCR 的文档理解
  • 使用 TAPAS 进行表格问答
  • 使用 Emu3 进行统一的多模态理解与生成
  • 使用 Llava-OneVision 进行视觉到文本转换
  • 使用 Llava 进行视觉问答
  • 使用 Kosmos-2 进行视觉指代表达分割
自然语言处理(NLP)
  • 使用 ModernBERT 进行掩码词补全
  • 使用 Gemma 进行命名实体识别
  • 使用 Mixtral 进行问答
  • 使用 BART 进行文本摘要
  • 使用 T5 进行翻译
  • 使用 Llama 进行文本生成
  • 使用 Qwen 进行文本分类

引用

现在您可以引用一篇关于 🤗 Transformers 库的论文

@inproceedings{wolf-etal-2020-transformers,
    title = "Transformers: State-of-the-Art Natural Language Processing",
    author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = oct,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.emnlp-demos.6/",
    pages = "38--45"
}