The Open Applied Informatics Journal
2009, 3 : 1-11Published online 2009 January 29. DOI: 10.2174/1874136300903010001
Publisher ID: TOAINFOJ-3-1
Estimation of Subcellular Proteomes in Bacterial Species
ABSTRACT
Computational methods for predicting the subcellular localization of bacterial proteins play a crucial role in the ongoing efforts to annotate the function of these proteins and to suggest potential drug targets. These methods, used in combination with other experimental and computational methods, can play an important role in biomedical research by annotating the proteomes of a wide variety of bacterial species. We use the ngLOC method, a Bayesian classifier that predicts the subcellular localization of a protein based on the distribution of n-grams in a curated dataset of experimentallydetermined proteins. Subcellular localization was predicted with an overall accuracy of 89.7% and 89.3% for Gramnegative and Gram-positive bacteria protein sequences, respectively. Through the use of a confidence score threshold, we improve the precision to 96.6% while covering 84.4% of Gram-negative bacterial data, and 96.0% while covering 87.9% of Gram-positive data. We use this method to estimate the subcellular proteomes of ten Gram-negative species and five Gram-positive species, covering an average of 74.7% and 80.6% of the proteome for Gram-negative and Gram-positive sequences, respectively. The current method is useful for large-scale analysis and annotation of the subcellular proteomes of bacterial species. We demonstrate that our method has excellent predictive performance while achieving superior proteome coverage compared to other popular methods such as PSORTb and PLoc.