Back close

Benchmarking long-read assembly tools and preprocessing strategies for bacterial genomes: A case study on E. coli DH5α

Publication Type : Journal Article

Publisher : Elsevier BV

Source : Biotechnology Reports

Url : https://doi.org/10.1016/j.btre.2025.e00931

Keywords : Whole genome sequencing, Long-read sequencing, Genome assembly, De novo assembly, Bacterial genome

Campus : Coimbatore

School : School of Physical Sciences

Department : Chemistry

Year : 2025

Abstract : Genome assembly is a crucial step in microbial genomics, significantly impacting downstream applications such as functional annotation and comparative genomics. While long-read sequencing technologies have improved genome reconstruction, the choice of assembler and preprocessing methods substantially influences assembly quality. Genome assembly is a crucial step in microbial genomics, significantly impacting downstream applications such as functional annotation and comparative genomics. While long-read sequencing technologies have improved genome reconstruction, the choice of assembler and preprocessing methods substantially influences assembly quality. Here, we benchmarked eleven long-read assemblers—Canu, Flye, HINGE, Miniasm, NECAT, NextDenovo, Raven, Shasta, SmartDenovo, wtdbg2 (Redbean), and Unicycler—using standardized computational resources. Assemblies were evaluated on runtime, contiguity (N50, total length, contig count), GC content, and completeness using Benchmarking Universal Single-Copy Orthologs (BUSCO). Assemblers employing progressive error correction with consensus refinement, notably NextDenovo and NECAT, consistently generated near-complete, single-contig assemblies with low misassemblies and stable performance across preprocessing types. Flye offered a strong balance of accuracy and contiguity, although it was sensitive to corrected input. Canu achieved high accuracy but produced fragmented assemblies (3–5 contigs) and required the longest runtimes. Unicycler reliably produced circular assemblies but with slightly shorter contigs than Flye or NextDenovo. Ultrafast tools such as Miniasm and Shasta provided rapid draft assemblies, yet were highly dependent on preprocessing and required polishing to achieve completeness. HINGE and wtdbg2 underperformed due to structural instability and fragmentation. Preprocessing had a marked effect: filtering improved genome fraction and BUSCO completeness, trimming reduced low-quality artifacts, and correction benefited OLC-based assemblers but occasionally increased misassemblies in graph-based tools. Overall, assembler choice and preprocessing jointly determine accuracy, contiguity, and computational efficiency. These results provide a reproducible framework for selecting assembly pipelines in prokaryotic genomics, underscoring that no single assembler is universally optimal.

Cite this Research Publication : Megha S. Kumar, Manoj Bhat Krishna, K.P. Soman, John Stanley, Nader Pourmand, Prashanth Suravajhala, T.G.Satheesh Babu, Benchmarking long-read assembly tools and preprocessing strategies for bacterial genomes: A case study on E. coli DH5α, Biotechnology Reports, Elsevier BV, 2025, https://doi.org/10.1016/j.btre.2025.e00931

Admissions Apply Now