Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data

Long-read sequencing can overcome the weaknesses of short reads in the assembly of eukaryotic genomes, however, at present additional scaffolding is needed to achieve chromosome-level assemblies. We generated PacBio long-read data of the genomes of three relatives of the model plant Arabidopsis thal...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiao, Wen-Biao (Author) , Kiefer, Christiane (Author) , Koch, Marcus (Author)
Format: Article (Journal)
Language:English
Published: Februray 3, 2017
In: Genome research
Year: 2017, Volume: 27, Pages: 778-786
ISSN:1549-5469
DOI:10.1101/gr.213652.116
Online Access:Verlag, kostenfrei, Volltext: http://dx.doi.org/10.1101/gr.213652.116
Verlag, kostenfrei, Volltext: http://genome.cshlp.org/content/early/2017/02/03/gr.213652.116
Get full text
Author Notes:Wen-Biao Jiao, Gonzalo Garcia Accinelli, Benjamin Hartwig, Christiane Kiefer, David Baker, Edouard Severing, Eva-Maria Willing, Mathieu Piednoel, Stefan Woetzel, Eva Madrid-Herrero, Bruno Huettel, Ulrike Hümann, Richard Reinhard, Marcus A. Koch, Daniel Swan, Bernardo Clavijo, George Coupland, and Korbinian Schneeberger
Description
Summary:Long-read sequencing can overcome the weaknesses of short reads in the assembly of eukaryotic genomes, however, at present additional scaffolding is needed to achieve chromosome-level assemblies. We generated PacBio long-read data of the genomes of three relatives of the model plant Arabidopsis thaliana and assembled all three genomes into only a few hundred contigs. To improve the contiguities of these assemblies, we generated BioNano Genomics optical mapping and Dovetail Genomics chromosome conformation capture data for genome scaffolding. Despite their technical differences, optical mapping and chromosome conformation capture performed similarly and doubled N50 values. After improving both integration methods, assembly contiguity reached chromosome-arm-levels. We rigorously assessed the quality of contigs and scaffolds using Illumina mate-pair libraries and genetic map information. This showed that PacBio assemblies have high sequence accuracy but can contain several misassemblies, which join unlinked regions of the genome. Most, but not all of these mis-joints were removed during the integration of the optical mapping and chromosome conformation capture data. Even though none of the centromeres was fully assembled, the scaffolds revealed large parts of some centromeric regions, even including some of the heterochromatic regions, which are not present in gold standard reference sequences.
Item Description:Gesehen am 27.07.2017
Physical Description:Online Resource
ISSN:1549-5469
DOI:10.1101/gr.213652.116