MTGDS:基于多任务学习的对话摘要生成框架项目

Multi-Task Generative Dialogue Summarization

分支1Tags0

MTGDS

Python License CSL

Official implementation of:

MTGDS: Multi-Task Generative Dialogue Summarization Learning Framework with Topic-Based Data Augmentation

Overview

MTGDS is a novel multi-task dialogue summarization framework that integrates topic-based data augmentation and external knowledge generation.

Specifically, MTGDS:

  • Annotates and enriches dialogue data with topic-aware information;
  • Generates topic-based augmented samples to improve data diversity;
  • Incorporates external knowledge at both entity and paragraph levels;
  • Enhances the model's understanding of implicit commonsense knowledge in dialogues.

Experiments on SAMSum and DialogSum demonstrate the effectiveness of MTGDS.

image.png

Installation

conda create -n mtgds python=3.7
conda activate mtgds

pip install -r requirements.txt

Before starting the processing, please download the corresponding pre trained model to the corresponding folder.

Data Annotation and Augmentation

We have cleaned and placed the original dataset Under the data_dehance/data directory first, we annotate all data:

  • For SAMSum: run python get_loss.py -d samsum python recover_word_loss.py -d samsum python get_representation_samsum.py python cosine_sim.py -d samsum python annotate.py -d samsum

  • For DialogSum: run python get_loss.py -d dialogsum python recover_word_loss.py -d dialogsum python get_representation_dialogsum.py python cosine_sim.py -d dialogsum python annotate.py -d dialogsum then, we augment these data:

  • For SAMSum: run python da_samsum.py

  • For DialogSum: run python da_dialogsum.py

we have provided enhanced data and Related knowledge in https://github.com/aquskerr/about-paper

Training

You can use the following commands to train our model:

  • For SAMSum: run python train_samsum.py
  • For DialogSum: run python train_dialogsum.py

You can adjust the training parameters in the ./config_dialogsum or ./config_samsum

Evaluation

we provided a checkpoint about DialogSum dataset in https://drive.google.com/drive/folders/1RCokWeqpPw9_nOaUCWgj1cRLBAJ5k7u1?usp=drive_link. put it in ./output/checkpoint-dialogsum and adjust related path in config file and run python test_dialogsum.py.

For the model you have trained yourself, please set the save and read locations in the config file, and run python test_dialogsum.py or python test_samsum.py

Citation

@article{shan2026multi,
  title={Multi-task generative dialogue summarization learning framework with topic-based data augmentation},
  author={Shan, Jing and Cao, Mingyang and Wang, Jiaying},
  journal={Computer Speech \& Language},
  pages={101998},
  year={2026},
  publisher={Elsevier}
}

项目介绍

Multi-Task Generative Dialogue Summarization

定制我的领域

下载使用量

0

项目总下载次数(含Clone、Pull、 zip 包及 release 下载),每日凌晨更新

语言类型

Perl73.54%
Python26.46%