Automatic Classification of Web documents According to their Styles


The KIPS Transactions:PartB , Vol. 11, No. 5, pp. 555-562, Aug. 2004
10.3745/KIPSTB.2004.11.5.555,   PDF Download:

Abstract

A genre or a style is another view of documents different from a subject or a topic. The style is also a criterion to classify the documents. There have been several studies on detecting a style of textual documents. However, only a few of them dealt with web documents. In this paper we suggest sets of features to detect styles of web documents. Web documents are different from textual documents in that they contain URL and HTML tags within the pages. We introduce the features specific to web documents, which are extracted from URL and HTML tags. Experimental results enable us to evaluate their characteristics and performances.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
K. J. Lee, C. S. Lim, J. H. Kim, "Automatic Classification of Web documents According to their Styles," The KIPS Transactions:PartB , vol. 11, no. 5, pp. 555-562, 2004. DOI: 10.3745/KIPSTB.2004.11.5.555.

[ACM Style]
Kong Joo Lee, Chul Su Lim, and Jae Hoon Kim. 2004. Automatic Classification of Web documents According to their Styles. The KIPS Transactions:PartB , 11, 5, (2004), 555-562. DOI: 10.3745/KIPSTB.2004.11.5.555.