AsmMix: an efficient haplotype-resolved hybrid de novo genome assembling pipeline

Bibliographic Details
Title: AsmMix: an efficient haplotype-resolved hybrid de novo genome assembling pipeline
Authors: Chao Liu, Pei Wu, Xue Wu, Xia Zhao, Fang Chen, Xiaofang Cheng, Hongmei Zhu, Ou Wang, Mengyang Xu
Source: Frontiers in Genetics, Vol 15 (2024)
Publisher Information: Frontiers Media S.A., 2024.
Publication Year: 2024
Collection: LCC:Genetics
Subject Terms: long reads, bioinformatics, de novo, genome assembly, haplotype, hybrid, Genetics, QH426-470
More Details: Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.
Document Type: article
File Description: electronic resource
Language: English
ISSN: 1664-8021
Relation: https://www.frontiersin.org/articles/10.3389/fgene.2024.1421565/full; https://doaj.org/toc/1664-8021
DOI: 10.3389/fgene.2024.1421565
Access URL: https://doaj.org/article/aafdffd2e52641ed80255f6b27dd626b
Accession Number: edsdoj.fdffd2e52641ed80255f6b27dd626b
Database: Directory of Open Access Journals
More Details
ISSN:16648021
DOI:10.3389/fgene.2024.1421565
Published in:Frontiers in Genetics
Language:English