Multi-Task Generative Dialogue Summarization
MTGDS
Official implementation of:
MTGDS: Multi-Task Generative Dialogue Summarization Learning Framework with Topic-Based Data Augmentation
Overview
MTGDS is a novel multi-task dialogue summarization framework that integrates topic-based data augmentation and external knowledge generation.
Specifically, MTGDS:
- Annotates and enriches dialogue data with topic-aware information;
- Generates topic-based augmented samples to improve data diversity;
- Incorporates external knowledge at both entity and paragraph levels;
- Enhances the model's understanding of implicit commonsense knowledge in dialogues.
Experiments on SAMSum and DialogSum demonstrate the effectiveness of MTGDS.

Installation
conda create -n mtgds python=3.7
conda activate mtgds
pip install -r requirements.txt
Before starting the processing, please download the corresponding pre trained model to the corresponding folder.
Data Annotation and Augmentation
We have cleaned and placed the original dataset Under the data_dehance/data directory first, we annotate all data:
-
For SAMSum: run
python get_loss.py -d samsumpython recover_word_loss.py -d samsumpython get_representation_samsum.pypython cosine_sim.py -d samsumpython annotate.py -d samsum -
For DialogSum: run
python get_loss.py -d dialogsumpython recover_word_loss.py -d dialogsumpython get_representation_dialogsum.pypython cosine_sim.py -d dialogsumpython annotate.py -d dialogsumthen, we augment these data: -
For SAMSum: run
python da_samsum.py -
For DialogSum: run
python da_dialogsum.py
we have provided enhanced data and Related knowledge in https://github.com/aquskerr/about-paper
Training
You can use the following commands to train our model:
- For SAMSum: run
python train_samsum.py - For DialogSum: run
python train_dialogsum.py
You can adjust the training parameters in the ./config_dialogsum or ./config_samsum
Evaluation
we provided a checkpoint about DialogSum dataset in https://drive.google.com/drive/folders/1RCokWeqpPw9_nOaUCWgj1cRLBAJ5k7u1?usp=drive_link.
put it in ./output/checkpoint-dialogsum and adjust related path in config file and run python test_dialogsum.py.
For the model you have trained yourself, please set the save and read locations in the config file, and run python test_dialogsum.py or python test_samsum.py
Citation
@article{shan2026multi,
title={Multi-task generative dialogue summarization learning framework with topic-based data augmentation},
author={Shan, Jing and Cao, Mingyang and Wang, Jiaying},
journal={Computer Speech \& Language},
pages={101998},
year={2026},
publisher={Elsevier}
}