Improved Focused Sampling for Class Imbalance Problem

KIPS Transactions on Software and Data Engineering, Vol. 14, No. 4, pp. 287-294, Apr. 2007
10.3745/KIPSTB.2007.14.4.287, Full Text:


Many classification algorithms for real world data suffer from a data class imbalance problem. To solve this problem, various methods have been proposed such as altering the training balance and designing better sampling strategies. The previous methods are not satisfy in the distribution of the input data and the constraint. In this paper, we propose a focused sampling method which is more superior than previous methods. To solve the problem, we must select some useful data set from all training sets. To get useful data set, the proposed method devide the region according to scores which are computed based on the distribution of SOM over the input data. The scores are sorted in ascending order. They represent the distribution of the input data, which may in turn represent the characteristics of the whole data. A new training dataset is obtained by eliminating unuseful data which are located in the region between an upper bound and a lower bound. The proposed method gives a better or at least similar performance compare to classification accuracy of previous approaches. Besides, it also gives several benefits : ratio reduction of class imbalance; size reduction of training sets; prevention of over-fitting. The proposed method has been tested with kNN classifier. An experimental result in ecoli data set shows that this method achieves the precision up to 2.27 times than the other methods.

Cite this article
[IEEE Style]
M. S. Kim, H. J. Yang, S. H. Kim and W. P. Cheah, "Improved Focused Sampling for Class Imbalance Problem," KIPS Journal B (2001 ~ 2012) , vol. 14, no. 4, pp. 287-294, 2007. DOI: 10.3745/KIPSTB.2007.14.4.287.

[ACM Style]
Man Sun Kim, Hyung Jeong Yang, Soo Hyung Kim, and Wooi Ping Cheah. 2007. Improved Focused Sampling for Class Imbalance Problem. KIPS Journal B (2001 ~ 2012) , 14, 4, (2007), 287-294. DOI: 10.3745/KIPSTB.2007.14.4.287.