Automatically Converting HTML Documents with Similar Pattern into XML Documents


The KIPS Transactions:PartD, Vol. 9, No. 3, pp. 355-364, Jun. 2002
10.3745/KIPSTD.2002.9.3.355,   PDF Download:

Abstract

Recently, WWW (World Wide Web) has become a source of a large amount of information, and is now recognized not only as an information-sharing tool, but also as an information repository. Currently, the majority of documents on the web were created using HTML (Hypertext Markup Language). Although HTML is simple and easy to learn, its inherent lack of describing document structure makes it difficult to retrieve information effectively. One possible solution would be to convert such HTML documents into XML (eXtensible Markup Language) documents. XML is a standard markup language for exchanging data on the web. It can describe a document structure freely by defining its own DTD (Document Type Definition). This makes it possible to integrate, store, and retrieve data on the web efficiently. In this paper, we will propose a converter that automatically converts HTML documents with similar pattern into XML documents by analyzing the document structure and recognizing its path information.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
K. Y. Oh and E. J. Hwang, "Automatically Converting HTML Documents with Similar Pattern into XML Documents," The KIPS Transactions:PartD, vol. 9, no. 3, pp. 355-364, 2002. DOI: 10.3745/KIPSTD.2002.9.3.355.

[ACM Style]
Keum Yong Oh and Een Jun Hwang. 2002. Automatically Converting HTML Documents with Similar Pattern into XML Documents. The KIPS Transactions:PartD, 9, 3, (2002), 355-364. DOI: 10.3745/KIPSTD.2002.9.3.355.