基于IBM Granite-4.0-1B-Base微调的轻量级指令模型,支持12种语言,具备摘要、分类、问答、RAG、代码生成及工具调用能力,适合边缘部署与低资源领域微调。【此简介由AI生成】
license: apache-2.0 library_name: transformers tags:
- language
- unsloth
- granite-4.0 base_model:
- ibm-granite/granite-4.0-1b
Unsloth Dynamic 2.0 实现了卓越的精度,性能超越其他主流量化方案。
Granite-4.0-1B
模型概述: Granite-4.0-1B 是一款轻量级指令模型,基于 Granite-4.0-1B-Base 微调而成。其训练数据结合了具有宽松许可的开源指令数据集和内部收集的合成数据集。该模型的开发运用了多种技术,包括监督微调、强化学习和模型融合。
- 开发者: Granite 团队、IBM
- HF 集合: Granite 4.0 Nano Language Models HF Collection
- GitHub 仓库: ibm-granite/granite-4.0-nano-language-models
- 网站: Granite Docs
- 发布日期: 2025 年 10 月 28 日
- 许可证: Apache 2.0
支持语言: 英语、德语、西班牙语、法语、日语、葡萄牙语、阿拉伯语、捷克语、意大利语、韩语、荷兰语和中文。用户可以对 Granite 4.0 Nano 模型进行微调,以支持此列表之外的更多语言。
预期用途: Granite 4.0 Nano 指令模型具备强大的指令遵循能力,使高级 AI 功能能够应用于设备端部署和研究用例。此外,其紧凑的模型大小使其非常适合在专业领域进行微调,而无需大量计算资源。
功能
- 文本摘要
- 文本分类
- 文本提取
- 问答
- 检索增强生成(RAG)
- 代码相关任务
- 函数调用任务
- 多语言对话用例
- 代码补全(中间填充式,FIM)
生成: 以下是使用 Granite-4.0-1B 模型的简单示例。
安装以下库:
pip install torch torchvision torchaudio
pip install accelerate
pip install transformers
然后,从与您的使用场景相关的部分复制代码片段。
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model_path = "ibm-granite/granite-4.0-1b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
chat = [
{ "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens,
max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])
预期输出:
<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>Almaden Research Center, San Jose, California<|end_of_text|>
工具调用: Granite-4.0-1B 具备增强的工具调用能力,能够无缝集成外部函数和 API。如需定义工具列表,请遵循 OpenAI 的函数定义模式。
以下是使用 Granite-4.0-1B 模型工具调用能力的示例:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model_path = "ibm-granite/granite-4.0-1b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a specified city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "Name of the city"
}
},
"required": ["city"]
}
}
}
]
# change input text as desired
chat = [
{ "role": "user", "content": "What's the weather like in Boston right now?" },
]
chat = tokenizer.apply_chat_template(chat, \
tokenize=False, \
tools=tools, \
add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens,
max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])
预期输出:
<|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
- <tools>
- unsloth
{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "Name of the city"}}, "required": ["city"]}}}
</tools>
For each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
- <tool_call>
- unsloth
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>What's the weather like in Boston right now?<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|><tool_call>
{"name": "get_current_weather", "arguments": {"city": "Boston"}}
</tool_call><|end_of_text|>
评估结果:
| 基准测试 | 指标 | 350M Dense | H 350M Dense | 1B Dense | H 1B Dense |
|---|---|---|---|---|---|
| 通用任务 | |||||
| MMLU | 5-shot | 35.01 | 36.21 | 59.39 | 59.74 |
| MMLU-Pro | 5-shot, CoT | 12.13 | 14.38 | 34.02 | 32.86 |
| BBH | 3-shot, CoT | 33.07 | 33.28 | 60.37 | 59.68 |
| AGI EVAL | 0-shot, CoT | 26.22 | 29.61 | 49.22 | 52.44 |
| GPQA | 0-shot, CoT | 24.11 | 26.12 | 29.91 | 29.69 |
| 对齐任务 | |||||
| IFEval | Instruct, Strict | 61.63 | 67.63 | 80.82 | 82.37 |
| IFEval | Prompt, Strict | 49.17 | 55.64 | 73.94 | 74.68 |
| IFEval | Average | 55.4 | 61.63 | 77.38 | 78.53 |
| 数学任务 | |||||
| GSM8K | 8-shot | 30.71 | 39.27 | 76.35 | 69.83 |
| GSM Symbolic | 8-shot | 26.76 | 33.7 | 72.3 | 65.72 |
| Minerva Math | 0-shot, CoT | 13.04 | 5.76 | 45.28 | 49.4 |
| DeepMind Math | 0-shot, CoT | 8.45 | 6.2 | 34 | 34.98 |
| 代码任务 | |||||
| HumanEval | pass@1 | 39 | 38 | 74 | 73 |
| HumanEval+ | pass@1 | 37 | 35 | 69 | 68 |
| MBPP | pass@1 | 48 | 49 | 65 | 69 |
| MBPP+ | pass@1 | 38 | 44 | 57 | 60 |
| CRUXEval-O | pass@1 | 23.75 | 25.5 | 33.13 | 36 |
| BigCodeBench | pass@1 | 11.14 | 11.23 | 30.18 | 29.12 |
| 工具调用任务 | |||||
| BFCL v3 | 39.32 | 43.32 | 54.82 | 50.21 | |
| 多语言任务 | |||||
| MULTIPLE | pass@1 | 15.99 | 14.31 | 32.24 | 36.11 |
| MMMLU | 5-shot | 28.23 | 27.95 | 45 | 49.43 |
| INCLUDE | 5-shot | 27.74 | 27.09 | 42.12 | 43.35 |
| MGSM | 8-shot | 14.72 | 16.16 | 37.84 | 27.52 |
| 安全性 | |||||
| SALAD-Bench | 97.12 | 96.55 | 93.44 | 96.4 | |
| AttaQ | 82.53 | 81.76 | 85.26 | 82.85 | |
| 基准测试 | 语言数量 | 语言 |
|---|---|---|
| MMMLU | 11 | ar, de, en, es, fr, ja, ko, pt, zh, bn, hi |
| INCLUDE | 14 | hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh |
| MGSM | 5 | en, es, fr, ja, zh |
模型架构:
Granite-4.0-1B 基线基于纯解码器密集型Transformer架构。该架构的核心组件包括:GQA、带SwiGLU的MLP、RMSNorm以及共享的输入/输出嵌入。
| 模型 | 350M Dense | H 350M Dense | 1B Dense | H 1B Dense |
|---|---|---|---|---|
| 嵌入维度 | 1024 | 768 | 2048 | 1536 |
| 层数 | 28 注意力层 | 4 注意力层 / 28 Mamba2 层 | 40 注意力层 | 4 注意力层 / 36 Mamba2 层 |
| 注意力头维度 | 64 | 64 | 128 | 128 |
| 注意力头数量 | 16 | 12 | 16 | 12 |
| KV头数量 | 4 | 4 | 4 | 4 |
| Mamba2状态大小 | - | 128 | - | 128 |
| Mamba2头数量 | - | 48 | - | 48 |
| MLP/共享专家隐藏层大小 | 2048 | 2048 | 4096 | 4096 |
| 专家数量 | - | - | - | - |
| 激活专家数量 | - | - | - | - |
| 专家隐藏层大小 | - | - | - | - |
| MLP激活函数 | SwiGLU | SwiGLU | SwiGLU | SwiGLU |
| 序列长度 | 32K | 32K | 128K | 128K |
| 位置嵌入 | RoPE | NoPE | RoPE | NoPE |
| 参数量 | 350M | 340M | 1.6B | 1.5B |
| 激活参数量 | 350M | 340M | 1.6B | 1.5B |
训练数据: 总体而言,我们的SFT数据主要由三个关键来源构成:(1)具有宽松许可的公开可用数据集,(2)针对特定能力的内部合成数据,以及(3)精选的人工整理数据。
基础设施: 我们在CoreWeave托管的NVIDIA GB200 NVL72集群上训练了Granite 4.0 Nano语言模型。机架内通信通过72-GPU NVLink域进行,而非阻塞的全Fat-Tree NDR 400 Gb/s InfiniBand网络提供机架间通信。该集群为我们在数千个GPU上训练模型提供了可扩展且高效的基础设施。
伦理考量与局限性: Granite 4.0 Nano指令模型主要使用以英语为主的指令-响应对进行微调,但也包含覆盖多种语言的多语言数据。尽管此模型能够处理多语言对话用例,但其性能可能无法与英语任务相媲美。在这种情况下,引入少量示例(少样本)可帮助模型生成更准确的输出。虽然在模型对齐过程中已充分考虑安全性,但在某些情况下,模型仍可能对用户提示产生不准确、有偏见或不安全的响应。因此,我们强烈建议社区在使用此模型时,针对其特定任务进行适当的安全测试和调优。
资源
- ⭐️ 了解Granite的最新更新:https://www.ibm.com/granite
- 📄 获取教程、最佳实践和提示工程建议,快速入门:https://www.ibm.com/granite/docs/
- 💡 了解最新的Granite学习资源:https://ibm.biz/granite-learning-resources