Building Sentence Meaning Identification Dataset Based on Social Problem-Solving R&D Reports


KIPS Transactions on Software and Data Engineering, Vol. 12, No. 4, pp. 159-172, Apr. 2023
https://doi.org/10.3745/KTSDE.2023.12.4.159,   PDF Download:
Keywords: Social Problem-Solving Research, Natural Language Process, Data Building, pre-trained language model
Abstract

In general, social problem-solving research aims to create important social value by offering meaningful answers to various social pending issues using scientific technologies. Not surprisingly, however, although numerous and extensive research attempts have been made to alleviate the social problems and issues in nation-wide, we still have many important social challenges and works to be done. In order to facilitate the entire process of the social problem-solving research and maximize its efficacy, it is vital to clearly identify and grasp the important and pressing problems to be focused upon. It is understandable for the problem discovery step to be drastically improved if current social issues can be automatically identified from existing R&D resources such as technical reports and articles. This paper introduces a comprehensive dataset which is essential to build a machine learning model for automatically detecting the social problems and solutions in various national research reports. Initially, we collected a total of 700 research reports regarding social problems and issues. Through intensive annotation process, we built totally 24,022 sentences each of which possesses its own category or label closely related to social problem-solving such as problems, purposes, solutions, effects and so on. Furthermore, we implemented four sentence classification models based on various neural language models and conducted a series of performance experiments using our dataset. As a result of the experiment, the model fine-tuned to the KLUE-BERT pre-trained language model showed the best performance with an accuracy of 75.853% and an F1 score of 63.503%.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
H. Shin, S. Jeong, H. Chun, L. Kwon, J. Lee, K. Park, S. Choi, "Building Sentence Meaning Identification Dataset Based on Social Problem-Solving R&D Reports," KIPS Transactions on Software and Data Engineering, vol. 12, no. 4, pp. 159-172, 2023. DOI: https://doi.org/10.3745/KTSDE.2023.12.4.159.

[ACM Style]
Hyeonho Shin, Seonki Jeong, Hong-Woo Chun, Lee-Nam Kwon, Jae-Min Lee, Kanghee Park, and Sung-Pil Choi. 2023. Building Sentence Meaning Identification Dataset Based on Social Problem-Solving R&D Reports. KIPS Transactions on Software and Data Engineering, 12, 4, (2023), 159-172. DOI: https://doi.org/10.3745/KTSDE.2023.12.4.159.