Improved Focused Sampling for Class Imbalance Problem


KIPS Transactions on Software and Data Engineering, Vol. 14, No. 4, pp. 287-294, Apr. 2007
10.3745/KIPSTB.2007.14.4.287, Full Text:

Abstract

Many classification algorithms for real world data suffer from a data class imbalance problem. To solve this problem, various methods have been proposed such as altering the training balance and designing better sampling strategies. The previous methods are not satisfy in the distribution of the input data and the constraint. In this paper, we propose a focused sampling method which is more superior than previous methods. To solve the problem, we must select some useful data set from all training sets. To get useful data set, the proposed method devide the region according to scores which are computed based on the distribution of SOM over the input data. The scores are sorted in ascending order. They represent the distribution of the input data, which may in turn represent the characteristics of the whole data. A new training dataset is obtained by eliminating unuseful data which are located in the region between an upper bound and a lower bound. The proposed method gives a better or at least similar performance compare to classification accuracy of previous approaches. Besides, it also gives several benefits : ratio reduction of class imbalance; size reduction of training sets; prevention of over-fitting. The proposed method has been tested with kNN classifier. An experimental result in ecoli data set shows that this method achieves the precision up to 2.27 times than the other methods.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
M. S. Kim, H. J. Yang, S. H. Kim and W. P. Cheah, "Improved Focused Sampling for Class Imbalance Problem," KIPS Journal B (2001 ~ 2012) , vol. 14, no. 4, pp. 287-294, 2007. DOI: 10.3745/KIPSTB.2007.14.4.287.

[ACM Style]
Man Sun Kim, Hyung Jeong Yang, Soo Hyung Kim, and Wooi Ping Cheah. 2007. Improved Focused Sampling for Class Imbalance Problem. KIPS Journal B (2001 ~ 2012) , 14, 4, (2007), 287-294. DOI: 10.3745/KIPSTB.2007.14.4.287.