WarrenCoombeMohamadiEtAl2019

Référence

Warren, R.L., Coombe, L., Mohamadi, H., Zhang, J., Jaquish, B., Isabel, N., Jones, S.J.M., Bousquet, J., Bohlmann, J., Birol, I. (2019) ntEdit: scalable genome sequence polishing. Bioinformatics (Oxford, England), 35(21):4430-4432. (Scopus )

Résumé

MOTIVATION: In the modern genomics era, genome sequence assemblies are routine practice. However, depending on the methodology, resulting drafts may contain considerable base errors. Although utilities exist for genome base polishing, they work best with high read coverage and do not scale well. We developed ntEdit, a Bloom filter-based genome sequence editing utility that scales to large mammalian and conifer genomes. RESULTS: We first tested ntEdit and the state-of-the-art assembly improvement tools GATK, Pilon and Racon on controlled Escherichia coli and Caenorhabditis elegans sequence data. Generally, ntEdit performs well at low sequence depths (<20×), fixing the majority (>97%) of base substitutions and indels, and its performance is largely constant with increased coverage. In all experiments conducted using a single CPU, the ntEdit pipeline executed in <14 s and <3 m, on average, on E.coli and C.elegans, respectively. We performed similar benchmarks on a sub-20× coverage human genome sequence dataset, inspecting accuracy and resource usage in editing chromosomes 1 and 21, and whole genome. ntEdit scaled linearly, executing in 30-40 m on those sequences. We show how ntEdit ran in <2 h 20 m to improve upon long and linked read human genome assemblies of NA12878, using high-coverage (54×) Illumina sequence data from the same individual, fixing frame shifts in coding sequences. We also generated 17-fold coverage spruce sequence data from haploid sequence sources (seed megagametophyte), and used it to edit our pseudo haploid assemblies of the 20 Gb interior and white spruce genomes in <4 and <5 h, respectively, making roughly 50M edits at a (substitution+indel) rate of 0.0024. AVAILABILITY AND IMPLEMENTATION: https://github.com/bcgsc/ntedit. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. © The Author(s) 2019. Published by Oxford University Press.

Format EndNote

Vous pouvez importer cette référence dans EndNote.

Format BibTeX-CSV

Vous pouvez importer cette référence en format BibTeX-CSV.

Format BibTeX

Vous pouvez copier l'entrée BibTeX de cette référence ci-bas, ou l'importer directement dans un logiciel tel que JabRef .

@ARTICLE { WarrenCoombeMohamadiEtAl2019,
    AUTHOR = { Warren, R.L. and Coombe, L. and Mohamadi, H. and Zhang, J. and Jaquish, B. and Isabel, N. and Jones, S.J.M. and Bousquet, J. and Bohlmann, J. and Birol, I. },
    TITLE = { ntEdit: scalable genome sequence polishing },
    JOURNAL = { Bioinformatics (Oxford, England) },
    YEAR = { 2019 },
    VOLUME = { 35 },
    NUMBER = { 21 },
    PAGES = { 4430-4432 },
    NOTE = { cited By 0 },
    ABSTRACT = { MOTIVATION: In the modern genomics era, genome sequence assemblies are routine practice. However, depending on the methodology, resulting drafts may contain considerable base errors. Although utilities exist for genome base polishing, they work best with high read coverage and do not scale well. We developed ntEdit, a Bloom filter-based genome sequence editing utility that scales to large mammalian and conifer genomes. RESULTS: We first tested ntEdit and the state-of-the-art assembly improvement tools GATK, Pilon and Racon on controlled Escherichia coli and Caenorhabditis elegans sequence data. Generally, ntEdit performs well at low sequence depths (<20×), fixing the majority (>97%) of base substitutions and indels, and its performance is largely constant with increased coverage. In all experiments conducted using a single CPU, the ntEdit pipeline executed in <14 s and <3 m, on average, on E.coli and C.elegans, respectively. We performed similar benchmarks on a sub-20× coverage human genome sequence dataset, inspecting accuracy and resource usage in editing chromosomes 1 and 21, and whole genome. ntEdit scaled linearly, executing in 30-40 m on those sequences. We show how ntEdit ran in <2 h 20 m to improve upon long and linked read human genome assemblies of NA12878, using high-coverage (54×) Illumina sequence data from the same individual, fixing frame shifts in coding sequences. We also generated 17-fold coverage spruce sequence data from haploid sequence sources (seed megagametophyte), and used it to edit our pseudo haploid assemblies of the 20 Gb interior and white spruce genomes in <4 and <5 h, respectively, making roughly 50M edits at a (substitution+indel) rate of 0.0024. AVAILABILITY AND IMPLEMENTATION: https://github.com/bcgsc/ntedit. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. © The Author(s) 2019. Published by Oxford University Press. },
    AFFILIATION = { Genome Sciences Centre, Vancouver, Canada; BC Ministry of Forests, Lands, Natural Resource OperationsVIC, Canada; Laurentian Forestry Centre, Natural Resources CanadaQuébec, Canada; Canada Research Chair in Forest Genomics, Université LavalQuébec, Canada; Michael Smith Laboratories, University of British Columbia, Vancouver, Canada },
    DOCUMENT_TYPE = { Article },
    DOI = { 10.1093/bioinformatics/btz400 },
    SOURCE = { Scopus },
    URL = { https://www.scopus.com/inward/record.uri?eid=2-s2.0-85074306724&doi=10.1093%2fbioinformatics%2fbtz400&partnerID=40&md5=61976839122ab36abd940a3aa615ce0c },
}

********************************************************** *************************** FRQNT ************************ **********************************************************

Un regroupement stratégique du

********************************************************** *********************** Infolettre *********************** **********************************************************

Abonnez-vous à
l'Infolettre du CEF!

********************************************************** ***************** Pub - Congrès Mycelium ****************** **********************************************************

Reporté en 2021

********************************************************** ***************** Pub - IWTT ****************** **********************************************************

Reporté en 2021

**********************************************************

***************** Pub - Symphonies_Boreales ****************** **********************************************************

********************************************************** ***************** Boîte à trucs *************** **********************************************************

CEF-Référence
La référence vedette !

Jérémie Alluard (2016) Les statistiques au moments de la rédaction 

  • Ce document a pour but de guider les étudiants à intégrer de manière appropriée une analyse statistique dans leur rapport de recherche.

Voir les autres...