Alleviating Semantic Term Mismatches in Korean Information Retrieval


The Transactions of the Korea Information Processing Society (1994 ~ 2000), Vol. 7, No. 12, pp. 3874-3884, Dec. 2000
10.3745/KIPSTE.2000.7.12.3874,   PDF Download:

Abstract

An information retrieval system has to retrieve all and only documents which are relevant to a user query, even if index terms and query terms are not matched exactly. However, term mismatches between index terms and query terms have been a serious obstacle to the enhancement of retrieval performance. In this paper, we discuss automatic term normalization between words in text corpora and their application to a Korean information retrieval system. We perform two types of term normalizations to alleviate semantic term mismatches equivalence class and co-occurrence cluster. First, transliterations, spelling errors, and synonyms are normalized into equivalence classes by using contextual similarity. Second, context-based terms are normalized by using a combination of mutual information and word context to establish word similarities. Next, unsupervised clustering is done by using K-means algorithm and co-occurrence clusters are identified. In this paper, these normalized term products are used in the query expansion to alleviate semantic term mismatches. In other words, we utilize two kinds of term normalizations, equivalence class and co-occurrence cluster, to expand user''s queries with new terms, in an attempt to make user''s queries more comprehensive (adding transliterations) or more specific (adding specializations). For query expansion, we employ two complementary methods term suggestion and term relevance feedback. The experimental results show that our proposed system can alleviate semantic term mismatches and can also provide the appropriate similarity measurements. As a result, we know that our system can improve the retrieval efficiency of the information retrieval system.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
B. H. Yun, S. J. Park, H. K. Kang, "Alleviating Semantic Term Mismatches in Korean Information Retrieval," The Transactions of the Korea Information Processing Society (1994 ~ 2000), vol. 7, no. 12, pp. 3874-3884, 2000. DOI: 10.3745/KIPSTE.2000.7.12.3874.

[ACM Style]
Bo Hyun Yun, Sung Jin Park, and Hyun Kyu Kang. 2000. Alleviating Semantic Term Mismatches in Korean Information Retrieval. The Transactions of the Korea Information Processing Society (1994 ~ 2000), 7, 12, (2000), 3874-3884. DOI: 10.3745/KIPSTE.2000.7.12.3874.