Measuring Reliability of POS Tagging Systems


The KIPS Transactions:PartB , Vol. 8, No. 4, pp. 365-372, Aug. 2001
10.3745/KIPSTB.2001.8.4.365,   PDF Download:

Abstract

This presents a method for measuring reliability of a part-of-speech (POS) tagging system. The reliability is the probability that there are not mis-tagged words in its results, which are POS tagged sentences. In general, reliability is estimated based on the reciprocal of error probabilities. In order to estimate the error probabilities, training corpus should be much large compared with that to calculate approximately probabilities for tagging POS's. To relax this problem, this paper also describes a method for estimating more reliable error probabilities using cross validation. In an experiment, the reliability of our POS tagging system is about 61% on the average, that is equivalent to the reliability when the number of morphemes in a sentence is 20 and the accuracy of the POS tagging system is 97.5%. We believe that this method for measuring reliability of POS tagging systems is valid because the accuracy of our POS tagging system without unknown words is 97.68%. We expect that this model can be applied to syntactic analyzers and information retrieval systems. In this paper, we applied this model to an error detection system in POS tagging.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
J. H. Kim, "Measuring Reliability of POS Tagging Systems," The KIPS Transactions:PartB , vol. 8, no. 4, pp. 365-372, 2001. DOI: 10.3745/KIPSTB.2001.8.4.365.

[ACM Style]
Jae Hoon Kim. 2001. Measuring Reliability of POS Tagging Systems. The KIPS Transactions:PartB , 8, 4, (2001), 365-372. DOI: 10.3745/KIPSTB.2001.8.4.365.