Publication Type : Journal Article
Publisher : Elsevier BV
Source : Biotechnology Reports
Url : https://doi.org/10.1016/j.btre.2025.e00931
Keywords : Whole genome sequencing, Long-read sequencing, Genome assembly, De novo assembly, Bacterial genome
Campus : Coimbatore
School : School of Physical Sciences
Department : Chemistry
Year : 2025
Abstract : Genome assembly is a crucial step in microbial genomics, significantly impacting downstream applications such as functional annotation and comparative genomics. While long-read sequencing technologies have improved genome reconstruction, the choice of assembler and preprocessing methods substantially influences assembly quality. Genome assembly is a crucial step in microbial genomics, significantly impacting downstream applications such as functional annotation and comparative genomics. While long-read sequencing technologies have improved genome reconstruction, the choice of assembler and preprocessing methods substantially influences assembly quality. Here, we benchmarked eleven long-read assemblers—Canu, Flye, HINGE, Miniasm, NECAT, NextDenovo, Raven, Shasta, SmartDenovo, wtdbg2 (Redbean), and Unicycler—using standardized computational resources. Assemblies were evaluated on runtime, contiguity (N50, total length, contig count), GC content, and completeness using Benchmarking Universal Single-Copy Orthologs (BUSCO). Assemblers employing progressive error correction with consensus refinement, notably NextDenovo and NECAT, consistently generated near-complete, single-contig assemblies with low misassemblies and stable performance across preprocessing types. Flye offered a strong balance of accuracy and contiguity, although it was sensitive to corrected input. Canu achieved high accuracy but produced fragmented assemblies (3–5 contigs) and required the longest runtimes. Unicycler reliably produced circular assemblies but with slightly shorter contigs than Flye or NextDenovo. Ultrafast tools such as Miniasm and Shasta provided rapid draft assemblies, yet were highly dependent on preprocessing and required polishing to achieve completeness. HINGE and wtdbg2 underperformed due to structural instability and fragmentation. Preprocessing had a marked effect: filtering improved genome fraction and BUSCO completeness, trimming reduced low-quality artifacts, and correction benefited OLC-based assemblers but occasionally increased misassemblies in graph-based tools. Overall, assembler choice and preprocessing jointly determine accuracy, contiguity, and computational efficiency. These results provide a reproducible framework for selecting assembly pipelines in prokaryotic genomics, underscoring that no single assembler is universally optimal.
Cite this Research Publication : Megha S. Kumar, Manoj Bhat Krishna, K.P. Soman, John Stanley, Nader Pourmand, Prashanth Suravajhala, T.G.Satheesh Babu, Benchmarking long-read assembly tools and preprocessing strategies for bacterial genomes: A case study on E. coli DH5α, Biotechnology Reports, Elsevier BV, 2025, https://doi.org/10.1016/j.btre.2025.e00931