The Open Applied Informatics Journal

2011, 5 : 30-44
Published online 2011 July 15. DOI: 10.2174/1874136301005010030
Publisher ID: TOAINFOJ-5-30

Identification of Fungal DNA Barcode Targets and PCR Primers Based on Pfam Protein Families and Taxonomic Hierarchy

Christopher T. Lewis , Satpal Bilkhu , Vincent Robert , Ursula Eberhardt , Szaniszlo Szoke , Keith A. Seifert1 and C. Andre Levesque
Eastern Cereals and Oilseed Research Centre, Agriculture and Agri-food Canada, 960 Carling Ave, Ottawa, ON K2E 5H3, Canada

ABSTRACT

DNA barcoding is the application of DNA sequences of standardized genetic markers for the identification of eukaryotic organisms. We attempted to identify alternative candidate barcode gene targets for the fungal biota from available fungal genomes using a taxonomy-aware processing pipeline. Putative-protein coding sequences were matched to Pfam protein families and aligned to reference Pfam accessions. Conserved sequence blocks were identified in the resulting alignments and degenerate primers were designed. The processing pipeline is described and the resulting candidate gene targets are discussed. The pipeline allows analysis of subsets at various hierarchical, taxonomic levels (selectable by GenBank taxonomy ID or scientific name) of the available reference data, allowing discrete taxonomic groups to be combined into a single subset, or for subordinate taxa to be excluded from the analysis of higher-level taxa. Putative degenerate primer pairs were designed as high as the superkingdom rank for the set of organisms included in the analysis. The identified targets have essential housekeeping functions, like the well known phylogenetic or barcode markers, and most have a better resolution potential to differentiate species among fully sequenced genomes than the most presently used markers. Some of the commonly used species-level phylogenetic markers for fungi, especially tef1-α and rpb2, were not recovered in our analysis because of their existence in multiple copies in single organisms, and because Pfam families do not always correlate with complete proteins.

Keywords:

Fungi, barcoding, internal transcribed spacer (ITS), translation elongation factor 1A (tef1a), cytochrome oxidase 1 (cox1, COI)..