Automatic Object Extraction from Electronic Documents Using Deep Neural Network


KIPS Transactions on Software and Data Engineering, Vol. 7, No. 11, pp. 411-418, Nov. 2018
10.3745/KTSDE.2018.7.11.411, Full Text:
Keywords: Object Extraction, Deep Learning, Tensorflow, PDF Document
Abstract

With the proliferation of artificial intelligence technology, it is becoming important to obtain, store, and utilize scientific data in research and science sectors. A number of methods for extracting meaningful objects such as graphs and tables from research articles have been proposed to eventually obtain scientific data. Existing extraction methods using heuristic approaches are hardly applicable to electronic documents having heterogeneous manuscript formats because they are designed to work properly for some targeted manuscripts. This paper proposes a prototype of an object extraction system which exploits a recent deep-learning technology so as to overcome the inflexibility of the heuristic approaches. We implemented our trained model, based on the Faster R-CNN algorithm, using the Google TensorFlow Object Detection API and also composed an annotated data set from 100 research articles for training and evaluation. Finally, a performance evaluation shows that the proposed system outperforms a comparator adopting heuristic approaches by 5.2%.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
H. Jang, Y. Chae, S. Lee and J. Jo, "Automatic Object Extraction from Electronic Documents Using Deep Neural Network," KIPS Transactions on Software and Data Engineering, vol. 7, no. 11, pp. 411-418, 2018. DOI: 10.3745/KTSDE.2018.7.11.411.

[ACM Style]
Heejin Jang, Yeonghun Chae, Sangwon Lee, and Jinyong Jo. 2018. Automatic Object Extraction from Electronic Documents Using Deep Neural Network. KIPS Transactions on Software and Data Engineering, 7, 11, (2018), 411-418. DOI: 10.3745/KTSDE.2018.7.11.411.