A Study on Improving Performance of Software Requirements Classification Models by Handling Imbalanced Data


KIPS Transactions on Software and Data Engineering, Vol. 12, No. 7, pp. 295-302, Jul. 2023
https://doi.org/10.3745/KTSDE.2023.12.7.295,   PDF Download:  
Keywords: Requirements Classification, Imbalanced Data, data augmentation, Undersampling, BERT
Abstract

Software requirements written in natural language may have different meanings from the stakeholders’ viewpoint. When designing an architecture based on quality attributes, it is necessary to accurately classify quality attribute requirements because the efficient design is possible only when appropriate architectural tactics for each quality attribute are selected. As a result, although many natural language processing models have been studied for the classification of requirements, which is a high-cost task, few topics improve classification performance with the imbalanced quality attribute datasets. In this study, we first show that the classification model can automatically classify the Korean requirement dataset through experiments. Based on these results, we explain that data augmentation through EDA(Easy Data Augmentation) techniques and undersampling strategies can improve the imbalance of quality attribute datasets, and show that they are effective in classifying requirements. The results improved by 5.24%p on F1-score, indicating that handling imbalanced data helps classify Korean requirements of classification models. Furthermore, detailed experiments of EDA illustrate operations that help improve classification performance


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
J. Choi, Y. Lee, C. Lim, H. Choi, "A Study on Improving Performance of Software Requirements Classification Models by Handling Imbalanced Data," KIPS Transactions on Software and Data Engineering, vol. 12, no. 7, pp. 295-302, 2023. DOI: https://doi.org/10.3745/KTSDE.2023.12.7.295.

[ACM Style]
Jong-Woo Choi, Young-Jun Lee, Chae-Gyun Lim, and Ho-Jin Choi. 2023. A Study on Improving Performance of Software Requirements Classification Models by Handling Imbalanced Data. KIPS Transactions on Software and Data Engineering, 12, 7, (2023), 295-302. DOI: https://doi.org/10.3745/KTSDE.2023.12.7.295.