Qwen2.5-14B的持续微调版本,采用Ties融合方法结合指令与基础模型,性能超越原版,在IFEval等多项基准测试中表现优异。【此简介由AI生成】
以下内容由 AI 翻译,如有问题请 点此提交 issue 反馈
license: apache-2.0 library_name: transformers base_model:
- Qwen/Qwen2.5-14B-Instruct model-index:
- name: Replete-LLM-V2.5-Qwen-14b
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc value: 58.4 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Replete-AI/Replete-LLM-V2.5-Qwen-14b name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm value: 49.39 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Replete-AI/Replete-LLM-V2.5-Qwen-14b name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match value: 15.63 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Replete-AI/Replete-LLM-V2.5-Qwen-14b name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm value: 16.22 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Replete-AI/Replete-LLM-V2.5-Qwen-14b name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm value: 18.83 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Replete-AI/Replete-LLM-V2.5-Qwen-14b name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc value: 48.62 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Replete-AI/Replete-LLM-V2.5-Qwen-14b name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
Rombos-LLM-V2.5-Qwen-14b

Rombos-LLM-V2.5-Qwen-14b 是基于 Qwen2.5-14B 持续微调的进阶版本。近期我注意到 Qwen 团队并未借鉴我采用的持续微调方法——这种方法能带来显著优势且没有任何副作用。因此我亲自采用 Ties 融合技术将指导模型与基础模型进行了融合。
该版本模型展现出比原版指导模型和基础模型更卓越的性能。
量化版本:
GGUF格式:https://huggingface.co/bartowski/Replete-LLM-V2.5-Qwen-14b-GGUF
性能基准测试:
开放大模型排行榜评估结果
详细结果请查阅此处
| 评估指标 | 数值 |
|---|---|
| 综合平均分 | 34.52 |
| IFEval (零样本) | 58.40 |
| BBH (三样本) | 49.39 |
| MATH 五级难度 (四样本) | 15.63 |
| GPQA (零样本) | 16.22 |
| MuSR (零样本) | 18.83 |
| MMLU-PRO (五样本) | 48.62 |