Outlier Detection By Clustering-Based Ensemble Model Construction


KIPS Transactions on Software and Data Engineering, Vol. 7, No. 11, pp. 435-442, Nov. 2018
10.3745/KTSDE.2018.7.11.435, Full Text:
Keywords: Streaming Data, Ensemble Method, Outlier Detection, K-Means Clustering
Abstract

Outlier detection means to detect data samples that deviate significantly from the distribution of normal data. Most outlier detection methods calculate an outlier score that indicates the extent to which a data sample is out of normal state and determine it to be an outlier when its outlier score is above a given threshold. However, since the range of an outlier score is different for each data and the outliers exist at a smaller ratio than the normal data, it is very difficult to determine the threshold value for an outlier score. Further, in an actual situation, it is not easy to acquire data including a sufficient amount of outliers available for learning. In this paper, we propose a clustering-based outlier detection method by constructing a model representing a normal data region using only normal data and performing binary classification of outliers and normal data for new data samples. Then, by dividing the given normal data into chunks, and constructing a clustering model for each chunk, we expand it to the ensemble method combining the decision by the models and apply it to the streaming data with dynamic changes. Experimental results using real data and artificial data show high performance of the proposed method.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
C. H. Park, T. Kim, J. Kim, S. Choi and G. Lee, "Outlier Detection By Clustering-Based Ensemble Model Construction," KIPS Transactions on Software and Data Engineering, vol. 7, no. 11, pp. 435-442, 2018. DOI: 10.3745/KTSDE.2018.7.11.435.

[ACM Style]
Cheong Hee Park, Taegong Kim, Jiil Kim, Semok Choi, and Gyeong-Hoon Lee. 2018. Outlier Detection By Clustering-Based Ensemble Model Construction. KIPS Transactions on Software and Data Engineering, 7, 11, (2018), 435-442. DOI: 10.3745/KTSDE.2018.7.11.435.