Spam Filter by Using X2 Statistics and Support Vector Machines


The KIPS Transactions:PartB , Vol. 17, No. 3, pp. 249-254, Jun. 2010
10.3745/KIPSTB.2010.17.3.249,   PDF Download:

Abstract

We propose an automatic spam filter for e-mail data using Support Vector Machines(SVM). We use a lexical form of a word and its part of speech(POS) tags as features and select features by chi square statistics. We represent each feature by TF(text frequency), TF-IDF, and binary weight for experiments. After training SVM with the selected features, SVM classifies each e-mail as spam or not. In experiment, the selected features improve the performance of our system and we acquired overall 98.9% of accuracy with TREC05-p1 spam corpus.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
S. W. Lee, "Spam Filter by Using X2 Statistics and Support Vector Machines," The KIPS Transactions:PartB , vol. 17, no. 3, pp. 249-254, 2010. DOI: 10.3745/KIPSTB.2010.17.3.249.

[ACM Style]
Song Wook Lee. 2010. Spam Filter by Using X2 Statistics and Support Vector Machines. The KIPS Transactions:PartB , 17, 3, (2010), 249-254. DOI: 10.3745/KIPSTB.2010.17.3.249.