A Study on Dataset Generation Method for Korean Language Information Extraction from Generative Large Language Model and Prompt Engineering


KIPS Transactions on Software and Data Engineering, Vol. 12, No. 11, pp. 481-492, Nov. 2023
https://doi.org/10.3745/KTSDE.2023.12.11.481,   PDF Download:
Keywords: Large Language Model, Prompt Egineering, Zero-shot Learning, Dataset Generation, Information Extraction
Abstract

This study explores how to build a Korean dataset to extract information from text using generative large language models. In modern society, mixed information circulates rapidly, and effectively categorizing and extracting it is crucial to the decision-making process. However, there is still a lack of Korean datasets for training. To overcome this, this study attempts to extract information using text-based zero-shot learning using a generative large language model to build a purposeful Korean dataset. In this study, the language model is instructed to output the desired result through prompt engineering in the form of “system”-“instruction”-“source input”-“output format”, and the dataset is built by utilizing the in-context learning characteristics of the language model through input sentences. We validate our approach by comparing the generated dataset with the existing benchmark dataset, and achieve 25.47% higher performance compared to the KLUE-RoBERTa-large model for the relation information extraction task. The results of this study are expected to contribute to AI research by showing the feasibility of extracting knowledge elements from Korean text. Furthermore, this methodology can be utilized for various fields and purposes, and has potential for building various Korean datasets.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
J. Y. Sang, J. S. Hyun, K. D. R. Sae, "A Study on Dataset Generation Method for Korean Language Information Extraction from Generative Large Language Model and Prompt Engineering," KIPS Transactions on Software and Data Engineering, vol. 12, no. 11, pp. 481-492, 2023. DOI: https://doi.org/10.3745/KTSDE.2023.12.11.481.

[ACM Style]
Jeong Young Sang, Ji Seung Hyun, and Kwon Da Rong Sae. 2023. A Study on Dataset Generation Method for Korean Language Information Extraction from Generative Large Language Model and Prompt Engineering. KIPS Transactions on Software and Data Engineering, 12, 11, (2023), 481-492. DOI: https://doi.org/10.3745/KTSDE.2023.12.11.481.