Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information


KIPS Transactions on Software and Data Engineering, Vol. 11, No. 3, pp. 125-132, Mar. 2022
https://doi.org/10.3745/KTSDE.2022.11.3.125,   PDF Download:
Keywords: Sentence Compression, Linguistic Information, Language Model, Perplexity
Abstract

Sentence compression is a natural language processing task that generates concise sentences that preserves the important meaning of the original sentence. For grammatically appropriate sentence compression, early studies utilized human-defined linguistic rules. Furthermore, while the sequence-to-sequence models perform well on various natural language processing tasks, such as machine translation, there have been studies that utilize it for sentence compression. However, for the linguistic rule-based studies, all rules have to be defined by human, and for the sequence-to-sequence model based studies require a large amount of parallel data for model training. In order to address these challenges, Deleter, a sentence compression model that leverages a pre-trained language model BERT, is proposed. Because the Deleter utilizes perplexity based score computed over BERT to compress sentences, any linguistic rules and parallel dataset is not required for sentence compression. However, because Deleter compresses sentences only considering perplexity, it does not compress sentences by reflecting the linguistic information of the words in the sentences. Furthermore, since the dataset used for pre-learning BERT are far from compressed sentences, there is a problem that this can lad to incorrect sentence compression. In order to address these problems, this paper proposes a method to quantify the importance of linguistic information and reflect it in perplexity-based sentence scoring. Furthermore, by fine-tuning BERT with a corpus of news articles that often contain proper nouns and often omit the unnecessary modifiers, we allow BERT to measure the perplexity appropriate for sentence compression. The evaluations on the English and Korean dataset confirm that the sentence compression performance of sentence-scoring based models can be improved by utilizing the proposed method.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
J. Lee, S. Kim, S. Park, "Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information," KIPS Transactions on Software and Data Engineering, vol. 11, no. 3, pp. 125-132, 2022. DOI: https://doi.org/10.3745/KTSDE.2022.11.3.125.

[ACM Style]
Jun-Beom Lee, So-Eon Kim, and Seong-Bae Park. 2022. Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information. KIPS Transactions on Software and Data Engineering, 11, 3, (2022), 125-132. DOI: https://doi.org/10.3745/KTSDE.2022.11.3.125.