The Open Applied Informatics Journal

2010, 4 : 15-27
Published online 2010 December 15. DOI: 10.2174/1874136301004010015
Publisher ID: TOAINFOJ-4-15

Searching for Related Descriptors Among Different Datasets: A New Strategy Implemented by the R Package “Dadi”

Livio Antonielli , Vincent Robert , Laura Corte , Luca Roscini , Ambra Bagnetti , Fabrizio Fatichenti and Gianluigi Cardinali
University of Perugia - DBA -Microbiology, Via Borgo 20 Giugno 74, 06121 Perugia, Italy;

ABSTRACT

Background:

The increasing number of techniques introduced to describe organisms and taxa produce multivariate datasets, often composed of relatively independent descriptors. Handling several descriptors can be laborious and often unnecessary when their information is not congruent to that of other datasets used in the same study. On the other hand, different levels of correlation between single descriptors to a whole dataset may suggest useful scientific hints. The DADI (Distance-based Analysis for (optimal) Descriptor Identification) algorithm is proposed to allow a rapid and complete analysis among descriptors coming from two different datasets with the same number of objects. DADI was employed to select FTIR (Fourier Transform Infrared Spectroscopy) spectral wavelengths according to their correlation with the 26S rDNA sequences of strains belonging to a yeast genus.

Results:

This procedure allowed to define a set of optimal wavelengths with an overall increase of the correlation between FTIR and 26S data.

Conclusions:

DADI can identify the FTIR wavenumbers best fitting to the chosen reference defining the descriptors to be used in FTIR and possibly in other metabolomic analyses.

Keywords:

Dataset, correlation, software, method, statistics, yeast.