EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets

Bibliographic Details
Title: EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets
Authors: Dao-Feng Zhang, Wei He, Zongze Shao, Iftikhar Ahmed, Yuqin Zhang, Wen-Jun Li, Zhe Zhao
Source: BMC Bioinformatics, Vol 24, Iss 1, Pp 1-12 (2023)
Publisher Information: BMC, 2023.
Publication Year: 2023
Collection: LCC:Computer applications to medicine. Medical informatics
LCC:Biology (General)
Subject Terms: Phylogeny inference, Supermatrix, Supertree, Prokaryote taxonomy, Core gene, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
More Details: Abstract Background Genome-scale phylogenetic analysis based on core gene sets is routinely used in microbiological research. However, the techniques are still not approachable for individuals with little bioinformatics experience. Here, we present EasyCGTree, a user-friendly and cross-platform pipeline to reconstruct genome-scale maximum-likehood (ML) phylogenetic tree using supermatrix (SM) and supertree (ST) approaches. Results EasyCGTree was implemented in Perl programming languages and was built using a collection of published reputable programs. All the programs were precompiled as standalone executable files and contained in the EasyCGTree package. It can run after installing Perl language environment. Several profile hidden Markov models (HMMs) of core gene sets were prepared in advance to construct a profile HMM database (PHD) that was enclosed in the package and available for homolog searching. Customized gene sets can also be used to build profile HMM and added to the PHD via EasyCGTree. Taking 43 genomes of the genus Paracoccus as the testing data set, consensus (a variant of the typical SM), SM, and ST trees were inferred via EasyCGTree successfully, and the SM trees were compared with those inferred via the pipelines UBCG and bcgTree, using the metrics of cophenetic correlation coefficients (CCC) and Robinson–Foulds distance (topological distance). The results suggested that EasyCGTree can infer SM trees with nearly identical topology (distance 0.99) to those of trees inferred with the two pipelines. Conclusions EasyCGTree is an all-in-one automatic pipeline from input data to phylogenomic tree with guaranteed accuracy, and is much easier to install and use than the reference pipelines. In addition, ST is implemented in EasyCGTree conveniently and can be used to explore prokaryotic evolutionary signals from a different perspective. The EasyCGTree version 4 is freely available for Linux and Windows users at Github ( https://github.com/zdf1987/EasyCGTree4 ).
Document Type: article
File Description: electronic resource
Language: English
ISSN: 1471-2105
Relation: https://doaj.org/toc/1471-2105
DOI: 10.1186/s12859-023-05527-2
Access URL: https://doaj.org/article/be9379c3ce66487d9a923e6668670325
Accession Number: edsdoj.be9379c3ce66487d9a923e6668670325
Database: Directory of Open Access Journals
Full text is not displayed to guests.
More Details
ISSN:14712105
DOI:10.1186/s12859-023-05527-2
Published in:BMC Bioinformatics
Language:English