google-gemma-3-27b-it-qat-q4_0-gguf-small/README.md-代码预览-google-gemma-3-27b-it-qat-q4_0-gguf-small:Gemma-3-27B-IT模型优化版：更小体积更低困惑度 - AtomGit

75d71217创建于 2025年4月7日历史提交

license: gemma metrics:

perplexity base_model:
google/gemma-3-27b-it-qat-q4_0-gguf
bartowski/google_gemma-3-27b-it-GGUF

This is a "self" merge of https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf and https://huggingface.co/bartowski/google_gemma-3-27b-it-GGUF.

The official QAT weights released by google use fp16 (instead of Q6_K) for the embeddings table, which makes this model take a significant extra amount of memory (and storage) compared to what Q4_0 quants are supposed to take. Instead of quantizing the table myself, I extracted it from Bartowski's quantized models, because those were already calibrated with imatrix, which should squeeze some extra performance out of it.

Here are some perplexity measurements:

Model	File size ↓	PPL (wiki.text.raw) ↓
This model	15.6 GB	8.2291 +/- 0.06315
QAT Q4_0 (google)	17.2 GB	8.2323 +/- 0.06320

Note that this model ends up smaller than the Q4_0 from Bartowski. This is because llama.cpp sets some tensors to Q4_1 when quantizing models to Q4_0 with imatrix, but this is a static quant. The perplexity score for this one is even lower with this model compared to the original model by Google, but the results are within margin of error, so it's probably just luck.