GoldPolish-target: targeted long-read genome assembly polishing.

Bibliographic Details
Title: GoldPolish-target: targeted long-read genome assembly polishing.
Authors: Zhang, Emily1 (AUTHOR) ezhang@bcgsc.ca, Coombe, Lauren1 (AUTHOR), Wong, Johnathan1 (AUTHOR), Warren, René L.1 (AUTHOR), Birol, Inanç1 (AUTHOR) ibirol@bcgsc.ca
Source: BMC Bioinformatics. 3/7/2025, Vol. 26 Issue 1, p1-13. 13p.
Subject Terms: *LIFE sciences, *DROSOPHILA melanogaster, *ERROR rates, *NUCLEOTIDE sequencing, *GENOMES
Abstract: Background: Advanced long-read sequencing technologies, such as those from Oxford Nanopore Technologies and Pacific Biosciences, are finding a wide use in de novo genome sequencing projects. However, long reads typically have higher error rates relative to short reads. If left unaddressed, subsequent genome assemblies may exhibit high base error rates that compromise the reliability of downstream analysis. Several specialized error correction tools for genome assemblies have since emerged, employing a range of algorithms and strategies to improve base quality. However, despite these efforts, many genome assembly workflows still produce regions with elevated error rates, such as gaps filled with unpolished or ambiguous bases. To address this, we introduce GoldPolish-Target, a modular targeted sequence polishing pipeline. Coupled with GoldPolish, a linear-time genome assembly algorithm, GoldPolish-Target isolates and polishes user-specified assembly loci, offering a resource-efficient means for polishing targeted regions of draft genomes. Results: Experiments using Drosophila melanogaster and Homo sapiens datasets demonstrate that GoldPolish-Target can reduce insertion/deletion (indel) and mismatch errors by up to 49.2% and 55.4% respectively, achieving base accuracy values upwards of 99.9% (Phred score Q > 30). This polishing accuracy is comparable to the current state-of-the-art, Medaka, while exhibiting up to 27-fold shorter run times and consuming 95% less memory, on average. Conclusion: GoldPolish-Target, in contrast to most other polishing tools, offers the ability to target specific regions of a genome assembly for polishing, providing a computationally light-weight and highly scalable solution for base error correction. [ABSTRACT FROM AUTHOR]
Copyright of BMC Bioinformatics is the property of BioMed Central and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Academic Search Complete
Full text is not displayed to guests.
More Details
ISSN:14712105
DOI:10.1186/s12859-025-06091-7
Published in:BMC Bioinformatics
Language:English