sentence-transformers模型，将文本映射到1024维向量空间，与SetFit配合提升德语少样本文本分类效果，采用欧氏距离损失函数训练，性能优于基础模型。【此简介由AI生成】

24321a26创建于 2023年3月3日9次提交

文件	最后提交记录	最后更新时间
1_Pooling	add model files	3 年前
.gitattributes	initial commit	3 年前
LICENSE	add license info	3 年前
README.md	fix license link	3 年前
config.json	add model files	3 年前
config_sentence_transformers.json	add model files	3 年前
modules.json	add model files	3 年前
pytorch_model.binLFS	add model files	3 年前
sentence_bert_config.json	add model files	3 年前
special_tokens_map.json	add model files	3 年前
tokenizer.json	add model files	3 年前
tokenizer_config.json	add model files	3 年前
vocab.txt	add model files	3 年前

自动翻译

pipeline_tag: sentence-similarity language:

de tags:
sentence-transformers
sentence-similarity
transformers
setfit license: mit datasets:
- deutsche-telekom/ger-backtrans-paraphrase

German BERT large paraphrase euclidean

这是一个 sentence-transformers 模型。它能将句子和段落（文本）映射到 1024 维的稠密向量空间中。该模型旨在与 SetFit 配合使用，以改进德语小样本文本分类。它有一个姊妹模型，名为 deutsche-telekom/gbert-large-paraphrase-cosine。

此模型基于 deepset/gbert-large 构建。非常感谢 deepset！

训练

损失函数
我们使用了 BatchHardSoftMarginTripletLoss 作为损失函数，并采用欧几里得距离：

    train_loss = losses.BatchHardSoftMarginTripletLoss(
       model=model,
       distance_metric=BatchHardTripletLossDistanceFunction.eucledian_distance,
   )

训练数据
该模型基于经过精心筛选的数据集 deutsche-telekom/ger-backtrans-paraphrase 进行训练。我们删除了以下句子对：