Improving Text Categorization with High Quality Bigrams


The KIPS Transactions:PartB , Vol. 9, No. 4, pp. 415-420, Aug. 2002
10.3745/KIPSTB.2002.9.4.415,   PDF Download:

Abstract

This paper presents an efficient text categorization algorithm that generates high quality bigrams by using the information gain metric, combined with various frequency thresholds. The bigrams, along with unigrams, are then given as features to a Na ve Bayes classifier. The experimental results suggest that the bigrams, while small in number, can substantially contribute to improving text categorization. Upon close examination of the results, we conclude that the algorithm is most successful in correctly classifying more positive documents, but may cause more negative documents to be classified incorrectly.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
C. D. Lee, C. M. Tan, Y. F. Wang, "Improving Text Categorization with High Quality Bigrams," The KIPS Transactions:PartB , vol. 9, no. 4, pp. 415-420, 2002. DOI: 10.3745/KIPSTB.2002.9.4.415.

[ACM Style]
Chan Do Lee, Chade Meng Tan, and Yuan Fang Wang. 2002. Improving Text Categorization with High Quality Bigrams. The KIPS Transactions:PartB , 9, 4, (2002), 415-420. DOI: 10.3745/KIPSTB.2002.9.4.415.