Building a Korean Text Summarization Dataset Using News Articles of Social Media


KIPS Transactions on Software and Data Engineering, Vol. 9, No. 8, pp. 251-258, Aug. 2020
https://doi.org/10.3745/KTSDE.2020.9.8.251,   PDF Download:
Keywords: Korean Text Summarization Dataset, Description, Headline, Subhead, Automatic Extractive Summarization
Abstract

A training dataset for text summarization consists of pairs of a document and its summary. As conventional approaches to building text summarization dataset are human labor intensive, it is not easy to construct large datasets for text summarization. A collection of news articles is one of the most popular resources for text summarization because it is easily accessible, large-scale and high-quality text. From social media news services, we can collect not only headlines and subheads of news articles but also summary descriptions that human editors write about the news articles. Approximately 425,000 pairs of news articles and their summaries are collected from social media. We implemented an automatic extractive summarizer and trained it on the dataset. The performance of the summarizer is compared with unsupervised models. The summarizer achieved better results than unsupervised models in terms of ROUGE score.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
G. H. Lee, Y. Park, K. J. Lee, "Building a Korean Text Summarization Dataset Using News Articles of Social Media," KIPS Transactions on Software and Data Engineering, vol. 9, no. 8, pp. 251-258, 2020. DOI: https://doi.org/10.3745/KTSDE.2020.9.8.251.

[ACM Style]
Gyoung Ho Lee, Yo-Han Park, and Kong Joo Lee. 2020. Building a Korean Text Summarization Dataset Using News Articles of Social Media. KIPS Transactions on Software and Data Engineering, 9, 8, (2020), 251-258. DOI: https://doi.org/10.3745/KTSDE.2020.9.8.251.