TY  - JOUR
T1  - A Study on Improving Performance of Software Requirements 
Classification Models by Handling Imbalanced Data
AU  - Choi, Jong-Woo 
AU  - Lee, Young-Jun 
AU  - Lim, Chae-Gyun 
AU  - Choi, Ho-Jin 
JO  - KIPS Transactions on Software and Data Engineering
PY  - 2023
DA  - 2023/1/30
DO  - https://doi.org/10.3745/KTSDE.2023.12.7.295
KW  - Requirements Classification
KW  - Imbalanced Data
KW  - Data Augmentation
KW  - Undersampling
KW  - BERT
AB  - Software requirements written in natural language may have different meanings from the stakeholders’ viewpoint. When designing 
an architecture based on quality attributes, it is necessary to accurately classify quality attribute requirements because the efficient design 
is possible only when appropriate architectural tactics for each quality attribute are selected. As a result, although many natural language 
processing models have been studied for the classification of requirements, which is a high-cost task, few topics improve classification 
performance with the imbalanced quality attribute datasets. In this study, we first show that the classification model can automatically 
classify the Korean requirement dataset through experiments. Based on these results, we explain that data augmentation through EDA(Easy 
Data Augmentation) techniques and undersampling strategies can improve the imbalance of quality attribute datasets, and show that they 
are effective in classifying requirements. The results improved by 5.24%p on F1-score, indicating that handling imbalanced data helps 
classify Korean requirements of classification models. Furthermore, detailed experiments of EDA illustrate operations that help improve 
classification performance