A Spam Filter System Based on Maximum Entropy Model Using Co-training with Spamminess Features and URL Features


The KIPS Transactions:PartB , Vol. 15, No. 1, pp. 61-72, Feb. 2008
10.3745/KIPSTB.2008.15.1.61,   PDF Download:

Abstract

This paper presents a spam filter system using co-training with spamminess features and URL features based on the maximum entropy model. Spamminess features are the emphasizing patterns or abnormal patterns in spam messages used by spammers to express their intention and to avoid being filtered by the spam filter system. Since spammers use URLs to give the details and make a change to the URL format not to be filtered by the black list, normal and abnormal URLs can be key features to detect the spam messages.Co-training with spamminess features and URL features uses two different features which are independent each other in training. The filter system can learn information from them independently. Experiment results on TREC spam test collection shows that the proposed approach achieves 9.1% improvement and 6.9% improvement in accuracy compared to the base system and bogo filter system, respectively.The result analysis shows that the proposed spamminess features and URL features are helpful. And an experiment result of the co-training shows that two feature sets are useful since the number of training documents are reduced while the accuracy is closed to the batch learning.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
M. G. Gong and K. S. Lee, "A Spam Filter System Based on Maximum Entropy Model Using Co-training with Spamminess Features and URL Features," The KIPS Transactions:PartB , vol. 15, no. 1, pp. 61-72, 2008. DOI: 10.3745/KIPSTB.2008.15.1.61.

[ACM Style]
Mi Gyoung Gong and Kyung Soon Lee. 2008. A Spam Filter System Based on Maximum Entropy Model Using Co-training with Spamminess Features and URL Features. The KIPS Transactions:PartB , 15, 1, (2008), 61-72. DOI: 10.3745/KIPSTB.2008.15.1.61.