Detection of Protein Subcellular Localization based on Syntactic Dependency Paths


The KIPS Transactions:PartB , Vol. 15, No. 4, pp. 375-382, Aug. 2008
10.3745/KIPSTB.2008.15.4.375,   PDF Download:

Abstract

A protein’s subcellular localization is considered an essential part of the description of its associated biomolecular phenomena. As the volume of biomolecular reports has increased, there has been a great deal of research on text mining to detect protein subcellular localization information in documents. It has been argued that linguistic information, especially syntactic information, is useful for identifying the subcellular localizations of proteins of interest. However, previous systems for detecting protein subcellular localization information used only shallow syntactic parsers, and showed poor performance. Thus, there remains a need to use a full syntactic parser and to apply deep linguistic knowledge to the analysis of text for protein subcellular localization information. In addition, we have attempted to use semantic information from the WordNet thesaurus. To improve performance in detecting protein subcellular localization information, this paper proposes a three-step method based on a full syntactic dependency parser and WordNet thesaurus. In the first step, we constructed syntactic dependency paths from each protein to its location candidate, and then converted the syntactic dependency paths into dependency trees. In the second step, we retrieved root information of the syntactic dependency trees. In the final step, we extracted syn-semantic patterns of protein subtrees and location subtrees. From the root and subtree nodes, we extracted syntactic category and syntactic direction as syntactic information, and synset offset of the WordNet thesaurus as semantic information. According to the root information and syn-semantic patterns of subtrees from the training data, we extracted (protein, localization) pairs from the test sentences. Even with no biomolecular knowledge, our method showed reasonable performance in experimental results using Medline abstract data. Our proposed method gave an F-measure of 74.53% for training data and 58.90% for test data, significantly outperforming previous methods, by 12-25.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
M. Y. Kim, "Detection of Protein Subcellular Localization based on Syntactic Dependency Paths," The KIPS Transactions:PartB , vol. 15, no. 4, pp. 375-382, 2008. DOI: 10.3745/KIPSTB.2008.15.4.375.

[ACM Style]
Mi Young Kim. 2008. Detection of Protein Subcellular Localization based on Syntactic Dependency Paths. The KIPS Transactions:PartB , 15, 4, (2008), 375-382. DOI: 10.3745/KIPSTB.2008.15.4.375.