The Open Genomics Journal

2011, 4 : 1-9
Published online 2011 May 16. DOI: 10.2174/1875693X01104010001
Publisher ID: TOGENJ-4-1

Sensitivity Analysis of Protein Role Prediction Methods: Which are the Relevant Data?

Liliana López-Kleine , Alain Trubuil and Véronique Monnet
Statistic Department at the Universidad Nacional de Colombia.

ABSTRACT

Genome sequencing has allowed the generation of genomic and high-throughput post-genomic data. The availability of huge amounts of this data has, in turn, led to the development of protein role inference methods. Some of these methods allow the use of heterogeneous data of varying quality which are more or less informative. However, only limited research has been devoted to finding relevant data in terms of the inference of protein roles. In this study, we identified relevant subsets of data for the prediction of protein roles within the framework of a kernel method (KCCA) used to predict the role of a bacterial protein. We carried out a sensitivity analysis based on a fractional factorial design in order to study the influence of different microarray experiments, as well as of bacterial orders (groups of families) used to construct the phylogenetic profiles, on the prediction of a protein role. The results of this analysis showed to be useful for interpreting biological predictions highlighting specific data that should be investigated. The method is not restricted to KCCA, nor to the organism or to the data we used here.