Orthogroups: Pangenome Matrices, Pangenome Dataframes and Fasta files

The core of the comparative genomics analysis presented in our articles relies in the computation of orthogroups across a set of genomes. In our case, because we had 9 different taxonomic groups, we computed 9 different sets of orthogroups using Orthofinder.

If you are interested in reproducing the orthogroups we constructed please refer to the following scripts and guide.

Scripts to compute orthogroups using Orthofinder and Diamond as a search engine

Given the orthogroups computed, we reconstructed pangenome matrices that depicts the abundance of each orthogroups across the genomes in a given dataset. The matrices files provided are tab delimited files with orthogroups as rows and genomes as columns.

Other relevant structures we share in this section are dataframes that show which coding sequences (CDS) in each genome belong to which orthogroup.

Finally, we also provide the fasta aminoacid file of each Orthogroup computed for all the 9 taxonomic groups analyzed. In the MAFFT|HMM section we utilized these files to reconstruct alignments and HMM profiles for the orthogroups.

The script utilized to compute the files described above can be found in the next link.

Scripts to compute pangenome matrices, dataframes and orthogroups fasta files