A Design on Informal Big Data Topic Extraction System Based on Spark Framework


KIPS Transactions on Software and Data Engineering, Vol. 5, No. 11, pp. 521-526, Nov. 2016
10.3745/KTSDE.2016.5.11.521,   PDF Download:
Keywords: Topic Model, Machine Learning
Abstract

As on-line informal text data have massive in its volume and have unstructured characteristics in nature, there are limitations in applying traditional relational data model technologies for data storage and data analysis jobs. Moreover, using dynamically generating massive social data, social user’s real-time reaction analysis tasks is hard to accomplish. In the paper, to capture easily the semantics of massive and informal on-line documents with unsupervised learning mechanism, we design and implement automatic topic extraction systems according to the mass of the words that consists a document. The input data set to the proposed system are generated first, using N-gram algorithm to build multiple words to capture the meaning of the sentences precisely, and Hadoop and Spark (In-memory distributed computing framework) are adopted to run topic model. In the experiment phases, TB level input data are processed for data preprocessing and proposed topic extraction steps are applied. We conclude that the proposed system shows good performance in extracting meaningful topics in time as the intermediate results come from main memories directly instead of an HDD reading


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
K. Park, "A Design on Informal Big Data Topic Extraction System Based on Spark Framework," KIPS Transactions on Software and Data Engineering, vol. 5, no. 11, pp. 521-526, 2016. DOI: 10.3745/KTSDE.2016.5.11.521.

[ACM Style]
Kiejin Park. 2016. A Design on Informal Big Data Topic Extraction System Based on Spark Framework. KIPS Transactions on Software and Data Engineering, 5, 11, (2016), 521-526. DOI: 10.3745/KTSDE.2016.5.11.521.