Automatic Classification and Vocabulary Analysis of Political Bias in News Articles by Using Subword Tokenization


KIPS Transactions on Software and Data Engineering, Vol. 10, No. 1, pp. 1-8, Jan. 2021
https://doi.org/10.3745/KTSDE.2021.10.1.1,   PDF Download:
Keywords: Political Bias, AI Bias, Lexical Bias, Document Embedding, Subword Tokenizer
Abstract

In the political field of news articles, there are polarized and biased characteristics such as conservative and liberal, which is called political bias. We constructed keyword-based dataset to classify bias of news articles. Most embedding researches represent a sentence with sequence of morphemes. In our work, we expect that the number of unknown tokens will be reduced if the sentences are constituted by subwords that are segmented by the language model. We propose a document embedding model with subword tokenization and apply this model to SVM and feedforward neural network structure to classify the political bias. As a result of comparing the performance of the document embedding model with morphological analysis, the document embedding model with subwords showed the highest accuracy at 78.22%. It was confirmed that the number of unknown tokens was reduced by subword tokenization. Using the best performance embedding model in our bias classification task, we extract the keywords based on politicians. The bias of keywords was verified by the average similarity with the vector of politicians from each political tendency.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
D. B. Cho, H. Y. Lee, W. S. Jung, S. S. Kang, "Automatic Classification and Vocabulary Analysis of Political Bias in News Articles by Using Subword Tokenization," KIPS Transactions on Software and Data Engineering, vol. 10, no. 1, pp. 1-8, 2021. DOI: https://doi.org/10.3745/KTSDE.2021.10.1.1.

[ACM Style]
Dan Bi Cho, Hyun Young Lee, Won Sup Jung, and Seung Shik Kang. 2021. Automatic Classification and Vocabulary Analysis of Political Bias in News Articles by Using Subword Tokenization. KIPS Transactions on Software and Data Engineering, 10, 1, (2021), 1-8. DOI: https://doi.org/10.3745/KTSDE.2021.10.1.1.