LATS: Low resource abstractive text summarization

Chris van Yperen, Flavius Frasincar*, Kamilah El Kanfoudi

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Text summarization is an increasingly crucial focus of Natural Language Processing (NLP), and state-of-the-art models such as PEGASUS have demonstrated remarkable potential to ever more efficient and accurate abstractive summarization. Nonetheless, recent developments of deep learning models that focus on training with large datasets can become at risk of sub-optimal generalization, inefficient training time, and can get stuck at local optima due to high-dimensional non-convex optimization domains. Current research in the field of NLP suggests that leveraging curriculum learning techniques to guide model training (enabling the model to learn from training data with increasing difficulty) could provide a means to achieve enhanced model performance. In this paper we investigate the effectiveness of curriculum learning strategies and data augmentation techniques on PEGASUS to increase performance with low-resource training data from the CNN/DM dataset. We introduce a novel text-summary pair complexity scoring algorithm along with two simple baseline difficulty measures. We find that our novel complexity sorting method consistently outperforms the baseline sorting methods and boosts performance of PEGASUS. The Baby-Steps curriculum learning strategy with this sorting method leads to performance improvements of 5.65 %, from a combined ROUGE F1-score of 83.28 to 87.99. When this strategy is combined with a data augmentation technique, Easy Data Augmentation, this leads to an improvement to 6.54 %. These statistics are relative to a baseline without curriculum learning or data augmentation.

Original languageEnglish
Article number128078
JournalExpert Systems with Applications
Volume286
DOIs
Publication statusPublished - 15 Aug 2025

Bibliographical note

Publisher Copyright:
© 2025 The Author(s)

Fingerprint

Dive into the research topics of 'LATS: Low resource abstractive text summarization'. Together they form a unique fingerprint.

Cite this