Ensemble Learning-Based Prediction of Good Sellers in Overseas Sales of Domestic Books and Keyword Analysis of Reviews of the Good Sellers


KIPS Transactions on Software and Data Engineering, Vol. 12, No. 4, pp. 173-178, Apr. 2023
https://doi.org/10.3745/KTSDE.2023.12.4.173,   PDF Download:
Keywords: Ensemble learning, Good Seller Prediction, Book Review Analysis, Text Mining, Keyword Analysis
Abstract

As Korean literature spreads around the world, its position in the overseas publishing market has become important. As demand in the overseas publishing market continues to grow, it is essential to predict future book sales and analyze the characteristics of books that have been highly favored by overseas readers in the past. In this study, we proposed ensemble learning based prediction model and analyzed characteristics of the cumulative sales of more than 5,000 copies classified as good sellers published overseas over the past 5 years. We applied the five ensemble learning models, i.e., XGBoost, Gradient Boosting, Adaboost, LightGBM, and Random Forest, and compared them with other machine learning algorithms, i.e., Support Vector Machine, Logistic Regression, and Deep Learning. Our experimental results showed that the ensemble algorithm outperforms other approaches in troubleshooting imbalanced data. In particular, the LightGBM model obtained an AUC value of 99.86% which is the best prediction performance. Among the features used for prediction, the most important feature is the author's number of overseas publications, and the second important feature is publication in countries with the largest publication market size. The number of evaluation participants is also an important feature. In addition, text mining was performed on the four book reviews that sold the most among good-selling books. Many reviews were interested in stories, characters, and writers and it seems that support for translation is needed as many of the keywords of “translation” appear in low-rated reviews.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
D. Y. Kim, N. Y. Kim, H. H. Kim, "Ensemble Learning-Based Prediction of Good Sellers in Overseas Sales of Domestic Books and Keyword Analysis of Reviews of the Good Sellers," KIPS Transactions on Software and Data Engineering, vol. 12, no. 4, pp. 173-178, 2023. DOI: https://doi.org/10.3745/KTSDE.2023.12.4.173.

[ACM Style]
Do Young Kim, Na Yeon Kim, and Hyon Hee Kim. 2023. Ensemble Learning-Based Prediction of Good Sellers in Overseas Sales of Domestic Books and Keyword Analysis of Reviews of the Good Sellers. KIPS Transactions on Software and Data Engineering, 12, 4, (2023), 173-178. DOI: https://doi.org/10.3745/KTSDE.2023.12.4.173.