IMG Data , Phylogenetic Trees and Metadata

This section contains the amino acid fasta files of the 3837 bacterial genomes utlized in this comparative genomics project. Additionally, we include the nuclotide fasta files for the 3837 genomes. Each genome uses IMG taxon_oid as its unique identifier. It is worth mentioning that this is a frozen version of the genomes corresponding to June 2016, because IMG updates constantly we do not guarantee that the 3837 taxon_oids are still valid. If you need the most updated version of these 3837 genomes please contact us.

In addition, dataframes depicting the abundance of COGs,KOs,PFAMs and TIGRFAMs profiles across the 3837 genoems utilized in this study are also included. Briefly, a dataframe is a structure that stores a data table, these dataframes can be easily manipulated using R.

The metadata file is a tab delimited text file, the most relevant fields are:

taxon_oid - Unique identifier of the genome. This identifier appears in the pangenome matrices, dataframes and phylogenetic trees.

Classification - Each of the 3837 genomes is classfied into NPA (Non Plant Associated), PA (Plant associated) and soil according to manual curation of the submission metadata of the genome.

Root-NotRoot - This column is a subclassification of some PA genomes that are Root associated genomes (RA).

Taxonomic_Group - Each of the genomes is classified into 9 taxons, this classification is based on placement along the general phylogenetic tree created for this analysis.

For each of the 9 taxons analyzed and the 3837 genomes (all 9 taxons together), we provide a phylogenetic tree in Newick format and a high resolution figure of the same tree. As well, if you are interested in exploring intercatively the 3837 genomes phylogenetic tree please visit the iTOL link below:.

3837 Genomes Interactive Tree iTOL