gelectra-base distilled for Extractive QA

Overview

Language model: gelectra-base-germanquad-distilled
Language: German
Training data: GermanQuAD train set (~ 12MB)
Eval data: GermanQuAD test set (~ 5MB)
Published: Apr 21st, 2021

Details

We trained a German question answering model with a gelectra-base model as its basis.
The training dataset is one-way annotated and contains 11518 questions and 11518 answers, while the test dataset is three-way annotated so that there are 2204 questions and with 2204·3−76 = 6536answers, because we removed 76 wrong answers.
In addition to the annotations in GermanQuAD, haystack's distillation feature was used for training. deepset/gelectra-large-germanquad was used as the teacher model.

Hyperparameters

batch_size = 24
n_epochs = 6
max_seq_len = 384
learning_rate = 3e-5
lr_schedule = LinearWarmup
embeds_dropout_prob = 0.1
temperature = 2
distillation_loss_weight = 0.75

Usage

In Haystack

Haystack is an AI orchestration framework to build customizable, production-ready LLM applications. You can use this model in Haystack to do extractive question answering on documents.

# After running pip install haystack-ai "transformers[torch,sentencepiece]"

from haystack import Document
from haystack.components.readers import ExtractiveReader

docs = [
    Document(content="Python is a popular programming language"),
    Document(content="python ist eine beliebte Programmiersprache"),
]

reader = ExtractiveReader(model="deepset/gelectra-base-germanquad-distilled")
reader.warm_up()

question = "What is a popular programming language?"
result = reader.run(query=question, documents=docs)
# {'answers': [ExtractedAnswer(query='What is a popular programming language?', score=0.5740374326705933, data='python', document=Document(id=..., content: '...'), context=None, document_offset=ExtractedAnswer.Span(start=0, end=6),...)]}

In Transformers

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
import torch
import torch_npu
import argparse
import os
from openmind_hub import snapshot_download

# 设置环境变量
os.environ['DEFAULT_DOWNLOAD_TIMEOUT'] = "600"
os.environ['DEFAULT_REQUEST_TIMEOUT'] = "600"

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_name_or_path",
        type=str,
        help="Jinan_AICC/gelectra-base-germanquad-distilled",
        default=None,
    )
    args = parser.parse_args()
    return args

args = parse_args()

if args.model_name_or_path:
    model_path = args.model_name_or_path
else:
    model_path = snapshot_download(
        "Jinan_AICC/gelectra-base-germanquad-distilled",
        revision="main",
        ignore_patterns=["*.h5", "*.ot", "*.msgpack"],       
    )

# a) Get predictions
nlp = pipeline('question-answering', model=model_path, tokenizer=model_path)
QA_input = {
    'question': 'Why is model conversion important?',
    'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
res = nlp(QA_input)

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

Performance

We evaluated the extractive question answering performance on our GermanQuAD test set. Model types and training data are included in the model name. For finetuning XLM-Roberta, we use the English SQuAD v2.0 dataset. The GELECTRA models are warm started on the German translation of SQuAD v1.1 and finetuned on \\germanquad. The human baseline was computed for the 3-way test set by taking one answer as prediction and the other two as ground truth.

"exact": 62.4773139745916
"f1": 80.9488017070188

Authors

Timo Möller: timo.moeller [at] deepset.ai
Julian Risch: julian.risch [at] deepset.ai
Malte Pietsch: malte.pietsch [at] deepset.ai
Michel Bartels: michel.bartels [at] deepset.ai