基于IBM Granite-4.0-350M-Base微调的轻量级指令模型，支持12种语言，具备摘要、分类、问答、RAG、代码生成及工具调用能力，适合边缘部署与领域微调。【此简介由AI生成】

SsystemDelete granite-4.0-350m-UD-IQ2_XXS.gguf

d6a88cd5创建于 2025年10月28日37次提交

文件	最后提交记录	最后更新时间
.gitattributes	Upload folder using huggingface_hub	6 个月前
README.md	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-BF16.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-IQ4_NL.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-IQ4_XS.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-Q3_K_M.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-Q3_K_S.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-Q4_0.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-Q4_1.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-Q4_K_M.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-Q4_K_S.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-Q5_K_M.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-Q5_K_S.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-Q6_K.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-Q8_0.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-UD-IQ3_XXS.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-UD-Q3_K_XL.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-UD-Q4_K_XL.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-UD-Q5_K_XL.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-UD-Q6_K_XL.ggufLFS	Upload folder using huggingface_hub	6 个月前
granite-4.0-350m-UD-Q8_K_XL.ggufLFS	Upload folder using huggingface_hub	6 个月前
imatrix_unsloth.gguf_fileLFS	Upload folder using huggingface_hub	6 个月前

自动翻译

license: apache-2.0 library_name: transformers tags:

language
unsloth
granite-4.0 base_model:
ibm-granite/granite-4.0-350m

Unsloth Dynamic 2.0 实现了卓越的准确率，性能超越其他主流量化方法。

Granite-4.0-350M

模型概述： Granite-4.0-350M 是一款轻量级指令模型，基于 Granite-4.0-350M-Base 微调而成，融合了具有宽松许可的开源指令数据集和内部收集的合成数据集。该模型采用多种技术开发，包括监督微调、强化学习和模型合并。

开发者： Granite 团队、IBM
HF 合集： Granite 4.0 Nano Language Models HF Collection
GitHub 仓库： ibm-granite/granite-4.0-nano-language-models
网站： Granite Docs
发布日期： 2025 年 10 月 28 日
许可证： Apache 2.0

支持语言： 英语、德语、西班牙语、法语、日语、葡萄牙语、阿拉伯语、捷克语、意大利语、韩语、荷兰语和中文。用户可以对 Granite 4.0 Nano 模型进行微调，以支持超出此列表的其他语言。

预期用途： Granite 4.0 Nano 指令模型具备强大的指令遵循能力，使高级 AI 功能能够应用于设备端部署和研究用例。此外，其紧凑的尺寸使其非常适合在专业领域进行微调，而无需大量计算资源。

功能

文本摘要
文本分类
文本提取
问答
检索增强生成（RAG）
代码相关任务
函数调用任务
多语言对话用例
代码中间填充（FIM）补全

生成： 以下是使用 Granite-4.0-350M 模型的简单示例。

安装以下库：

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

然后，从与您的使用场景相关的部分复制代码片段。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_path = "ibm-granite/granite-4.0-350M"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens, 
                        max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])

预期输出：

<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>Almaden Research Center, San Jose, California<|end_of_text|>

工具调用：
Granite-4.0-350M 具备增强的工具调用能力，可实现与外部函数和 API 的无缝集成。如需定义工具列表，请遵循 OpenAI 的函数定义模式。

以下是使用 Granite-4.0-350M 模型工具调用能力的示例：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"
model_path = "ibm-granite/granite-4.0-350M"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a specified city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "Name of the city"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

# change input text as desired
chat = [
    { "role": "user", "content": "What's the weather like in Boston right now?" },
]
chat = tokenizer.apply_chat_template(chat, \
                                     tokenize=False, \
                                     tools=tools, \
                                     add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens, 
                        max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])

预期输出：

<|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
- <tools>
- unsloth
{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather for a specified city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "Name of the city"}}, "required": ["city"]}}}
</tools>

For each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
- <tool_call>
- unsloth
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>What's the weather like in Boston right now?<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|><tool_call>
{"name": "get_current_weather", "arguments": {"city": "Boston"}}
</tool_call><|end_of_text|>

评估结果：

基准测试	指标	350M Dense	H 350M Dense	1B Dense	H 1B Dense
通用任务
MMLU	5-shot	35.01	36.21	59.39	59.74
MMLU-Pro	5-shot, CoT	12.13	14.38	34.02	32.86
BBH	3-shot, CoT	33.07	33.28	60.37	59.68
AGI EVAL	0-shot, CoT	26.22	29.61	49.22	52.44
GPQA	0-shot, CoT	24.11	26.12	29.91	29.69
对齐任务
IFEval	Instruct, Strict	61.63	67.63	80.82	82.37
IFEval	Prompt, Strict	49.17	55.64	73.94	74.68
IFEval	Average	55.4	61.63	77.38	78.53
数学任务
GSM8K	8-shot	30.71	39.27	76.35	69.83
GSM Symbolic	8-shot	26.76	33.7	72.3	65.72
Minerva Math	0-shot, CoT	13.04	5.76	45.28	49.4
DeepMind Math	0-shot, CoT	8.45	6.2	34	34.98
代码任务
HumanEval	pass@1	39	38	74	73
HumanEval+	pass@1	37	35	69	68
MBPP	pass@1	48	49	65	69
MBPP+	pass@1	38	44	57	60
CRUXEval-O	pass@1	23.75	25.5	33.13	36
BigCodeBench	pass@1	11.14	11.23	30.18	29.12
工具调用任务
BFCL v3		39.32	43.32	54.82	50.21
多语言任务
MULTIPLE	pass@1	15.99	14.31	32.24	36.11
MMMLU	5-shot	28.23	27.95	45	49.43
INCLUDE	5-shot	27.74	27.09	42.12	43.35
MGSM	8-shot	14.72	16.16	37.84	27.52
安全性
SALAD-Bench		97.12	96.55	93.44	96.4
AttaQ		82.53	81.76	85.26	82.85

多语言基准测试及其包含的语言：
基准测试	语言数量	语言
MMMLU	11	ar, de, en, es, fr, ja, ko, pt, zh, bn, hi
INCLUDE	14	hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh
MGSM	5	en, es, fr, ja, zh

模型架构：

Granite-4.0-350M 基线基于纯解码器密集型Transformer架构。该架构的核心组件包括：GQA、SwiGLU激活函数的MLP、RMSNorm以及共享输入/输出嵌入。

模型	350M Dense	H 350M Dense	1B Dense	H 1B Dense
嵌入维度	1024	768	2048	1536
层数	28 注意力层	4 注意力层 / 28 Mamba2层	40 注意力层	4 注意力层 / 36 Mamba2层
注意力头维度	64	64	128	128
注意力头数量	16	12	16	12
KV头数量	4	4	4	4
Mamba2状态维度	-	128	-	128
Mamba2头数量	-	48	-	48
MLP/共享专家隐藏层维度	2048	2048	4096	4096
专家数量	-	-	-	-
激活专家数量	-	-	-	-
专家隐藏层维度	-	-	-	-
MLP激活函数	SwiGLU	SwiGLU	SwiGLU	SwiGLU
序列长度	32K	32K	128K	128K
位置嵌入	RoPE	NoPE	RoPE	NoPE
参数量	350M	340M	1.6B	1.5B
激活参数量	350M	340M	1.6B	1.5B

训练数据： 总体而言，我们的SFT数据主要由三个关键来源构成：(1) 具有宽松许可的公开可用数据集，(2) 针对特定能力的内部合成数据，以及(3) 精选的人工整理数据。

基础设施： 我们在CoreWeave托管的NVIDIA GB200 NVL72集群上训练了Granite 4.0 Nano语言模型。机架内通信通过72-GPU NVLink域进行，而无阻塞的全Fat-Tree NDR 400 Gb/s InfiniBand网络则提供机架间通信。该集群为我们在数千个GPU上训练模型提供了可扩展且高效的基础设施。

伦理考量与局限性： Granite 4.0 Nano指令模型主要使用以英语为主的指令-响应对进行微调，但也包含覆盖多种语言的多语言数据。尽管此模型能够处理多语言对话用例，但其性能可能无法与英语任务相媲美。在这种情况下，引入少量示例（少样本）可帮助模型生成更准确的输出。虽然该模型在训练时已考虑安全性进行了对齐，但在某些情况下，模型仍可能对用户提示产生不准确、有偏见或不安全的响应。因此，我们强烈建议社区在使用此模型时，针对其特定任务进行适当的安全测试和调优。

资源

⭐️ 了解Granite的最新更新：https://www.ibm.com/granite
📄 获取教程、最佳实践和提示工程建议：https://www.ibm.com/granite/docs/
💡 了解最新的Granite学习资源：https://ibm.biz/granite-learning-resources

项目介绍

下载使用量

项目总下载次数（含Clone、Pull、 zip 包及 release 下载），每日凌晨更新

granite-4.0-350m-GGUF:轻量级多语言指令模型，支持RAG、代码任务与工具调用

Granite-4.0-350M

项目介绍

下载使用量

目录