Sequencing the genome of Qfly

TryNeo

Males of Q-fly (left) and the closely related Lesser Q-fly. The genomes of both were analysed in our genome paper

We have just published the paper describing the first de novo genome assembly of a tephritid fruit fly in BMC Genomics [full text]:

Gilchrist AS, Shearman DC, Frommer M, Raphael KA, Deshpande NP, Wilkins MR, Sherwin WB, Sved JA.  (2014) The draft genome of the pest tephritid fruit fly Bactrocera tryoni: resources for the genomic analysis of hybridising species. BMC Genomics 2014, 15:1153  doi:10.1186/1471-2164-15-1153.

The genome sequence of Qfly has been submitted to DDBJ/EMBL/GenBank (accession number JHQJ00000000, currently version JHQJ01000000).  It turns out that Q-fly had a reasonably large genome for a fruit fly (~700 million base pairs – about a quarter the size of the human genome). We put a lot of effort into producing a reliable set of “repeated sequences”. These few hundred short sequences are repeated many times – making up about about one third of the entire genome.

The summary of the paper is set out below:
Background
The tephritid fruit flies include a number of economically important pests of horticulture, with a large accumulated body of research on their biology and control. Amongst the Tephritidae, the genus Bactrocera, containing over 400 species, presents various species groups of potential utility for genetic studies of speciation, behaviour or pest control. In Australia, there exists a triad of closely-related, sympatric Bactrocera species which do not mate in the wild but which, despite distinct morphologies and behaviours, can be force-mated in the laboratory to produce fertile hybrid offspring. To exploit the opportunities offered by genomics, such as the efficient identification of genetic loci central to pest behaviour and to the earliest stages of speciation, investigators require genomic resources for future investigations.
Results
We produced a draft de novo genome assembly of Australia’s major tephritid pest species, Bactrocera tryoni. The 701Mb male genome includes approximately 150Mb of interspersed repetitive DNA sequences and 60Mb of satellite DNA. Assessment using conserved core eukaryotic sequences indicated 98% completeness. Over 16,000 MAKER-derived gene models showed a large degree of overlap with other Dipteran reference genomes. The sequence of the ribosomal RNA transcribed unit was also determined. Unscaffolded assemblies of B. neohumeralis and B. jarvisi were then produced; comparison with B. tryoni showed that the species are more closely related than any Drosophila species pair. The similarity of the genomes was exploited to identify 4924 potentially diagnostic indels between the species, all of which occur in non-coding regions.
Conclusions
This first draft B. tryoni genome resembles other dipteran genomes in terms of size and putative coding sequences. For all three species included in this study, we have identified a comprehensive set of non-redundant repetitive sequences, including the ribosomal RNA unit, and have quantified the major satellite DNA families. These genetic resources will facilitate the further investigations of genetic mechanisms responsible for the behavioural and morphological differences between these three species and other tephritids. We have also shown how whole genome sequence data can be used to generate simple diagnostic tests between very closely-related species where only one of the species is scaffolded.

A. Stuart Gilchrist, Deborah C. A. Shearman, Marianne Frommer, Kathryn A. Raphael, Nandan P. Deshpande2, Marc R Wilkins2,3,4, William B Sherwin1 and John A. Sved
1Evolution and Ecology Research Centre, School of Biological, Earth and Environmental Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
2Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney 2052, NSW, Australia
3School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney 2052, NSW, Australia
4Ramaciotti Centre for Gene Function Analysis, The University of New South Wales, Sydney 2052, NSW, Australia