Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts


KIPS Transactions on Software and Data Engineering, Vol. 6, No. 5, pp. 279-284, May. 2017
10.3745/KTSDE.2017.6.5.279,   PDF Download:
Keywords: Machine Learning, Information Extraction, Ensemble, Logistic Regression, Social Media
Abstract

Currenty, in the era of big data, text mining and opinion mining have been used in many domains, and one of their most important research issues is to extract significant information from social media. Thus in this paper, we propose a logistic regression ensemble method of finding the main body text from blog HTML. First, we extract structural features and text features from blog HTML tags. Then we construct a classification model with logistic regression and ensemble that can decide whether any given tags involve main body text or not. One of our important findings is that the main body text can be found through ‘depth’ features extracted from HTML tags. In our experiment using diverse topics of blog data collected from the web, our tag classification model achieved 99% in terms of accuracy, and it recalled 80.5% of documents that have tags involving the main body text.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
K. S. Hyeon and K. H. Joon, "Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts," KIPS Transactions on Software and Data Engineering, vol. 6, no. 5, pp. 279-284, 2017. DOI: 10.3745/KTSDE.2017.6.5.279.

[ACM Style]
Kim So Hyeon and Kim Han Joon. 2017. Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts. KIPS Transactions on Software and Data Engineering, 6, 5, (2017), 279-284. DOI: 10.3745/KTSDE.2017.6.5.279.