Next generation sequencing analysis of the highly repetitive barley genome

Steuernagel, Burkhard

The genome of barley has a size of more than 5,000 mega-base-pairs and its content of repetitive DNA exceeds 80%. This extreme size and complexity makes the sequencing of the genome a challenging task. A method for sequencing BAC-clones barcoded pools on the 454 platform together with a pipeline for high-throughput assembly of that data is introduced. Furthermore, the benefit of barcoding and separate assembly of individual BACs in contrast to pooled assemblies is assessed. It is shown that the omission of the barcoding step will lead to mis-assemblies joining parts from different BACs together. A case-study for scaffolding was performed using Illumina mate pair libraries of pooled untagged BAC-clones. Even at a low coverage of mate pairs with 2,800 base-pairs distance and a read length of 36, the data could be successfully applied to scaffold contigs derived from the 454 assembly pipeline. Whole genome and whole chromosome shotgun approaches have been successful in analysing of the genic regions of barley. Using flow-sorting, the DNA of barley chromosome 1H was isolated and subsequently shotgun sequenced on the 454-platform to a coverage of 1.3-fold. Known regions of syntheny between rice and barley were confirmed. Applying a synthenic integration of the sequenced genomes of rice and sorghum with the data of the barley chromosome 1H, a virtual gene order of all barley genes on 1H that are in synthenic context was derived. An assembly of deep whole genome shotgun sequencing data of barley resulted in a size of less than half of the actual genome size, implying the collapse of repetitive regions. Nevertheless, comparisons to full-length cDNAs show that most genes are represented and correctly reconstructed by the assembly. These results substantiate the hybrid approach showing whole genome/chromosome shotgun approaches to be capable of analysing genic regions but only a hierarchical shotgun to be serviceable on a way towards a reference genome sequence.


Citation style:
Could not load citation form.