The Open Applied Informatics Journal
2011, 5 : 30-44Published online 2011 July 15. DOI: 10.2174/1874136301005010030
Publisher ID: TOAINFOJ-5-30
Identification of Fungal DNA Barcode Targets and PCR Primers Based on Pfam Protein Families and Taxonomic Hierarchy
ABSTRACT
DNA barcoding is the application of DNA sequences of standardized genetic markers for the identification of eukaryotic organisms. We attempted to identify alternative candidate barcode gene targets for the fungal biota from available fungal genomes using a taxonomy-aware processing pipeline. Putative-protein coding sequences were matched to Pfam protein families and aligned to reference Pfam accessions. Conserved sequence blocks were identified in the resulting alignments and degenerate primers were designed. The processing pipeline is described and the resulting candidate gene targets are discussed. The pipeline allows analysis of subsets at various hierarchical, taxonomic levels (selectable by GenBank taxonomy ID or scientific name) of the available reference data, allowing discrete taxonomic groups to be combined into a single subset, or for subordinate taxa to be excluded from the analysis of higher-level taxa. Putative degenerate primer pairs were designed as high as the superkingdom rank for the set of organisms included in the analysis. The identified targets have essential housekeeping functions, like the well known phylogenetic or barcode markers, and most have a better resolution potential to differentiate species among fully sequenced genomes than the most presently used markers. Some of the commonly used species-level phylogenetic markers for fungi, especially tef1-α and rpb2, were not recovered in our analysis because of their existence in multiple copies in single organisms, and because Pfam families do not always correlate with complete proteins.