10 May 2017 Helsinki - Big Data is a buzzword that is often used in customer behaviour analyses or social network modelling. However, Big Data is also being created in biology, where modern high-throughput DNA sequencing can easily produce loads of data. Exponential decrease in the cost of sequencing during this decade has made it possible to analyze whole populations of organisms instead of a small set or only one selected individual. Population-level data opens up completely new ways of studying the organisms. For example, history of the species can be inferred from patterns imprinted in the genomes by long-term evolution and more recent natural selection. These types of analyses have earlier been carried out for humans where there already exists plenty of data, but the low cost of DNA sequencing makes it now possible to look at altogether different species, such as ones of importance to forestry and Finnish culture.
Here, researchers collected birch samples from twelve sites spanning a range from Ireland to the heart of Siberia and from Loppi, southern Finland, up to Kittilä, northern Finland.
"The project produced altogether over 700 Gigabases of genome sequence, resulting in over 20 Terabytes of data from various analyses", reported Research Director Petri Auvinen from DNA sequencing and genomics lab.
Computational analyses showed population bottlenecks, periods of extremely low number of individuals, at times with known climatic upheaval. The first bottleneck occurred around 66 million years ago (Mya), at the time when dinosaurs became extinct, followed by bottlenecks 34 Mya, 14.5 Mya, and 1 million years ago.
"This may be connected with speciation events in birches, since fossil evidence shows that the alders and birches split already around 60 Mya, and the white-barked birches had appeared around 10 million years ago", stated Professor Victor Albert.
After the last bottleneck the birch population has been steadily increasing. The last ice age split the birches into two populations, a European and a Siberian one, which have been mixing in Finland since the melting of the continental ice sheet.
In addition to silver birch, the genomes of six other birch species were also sequenced, as well as two closely related alders, grey and black alder. Making a distinction between diploid silver birches and tetraploid downy birches - which have doubled their genomes compared to diploids - proved to be more difficult than expected, since some of the sampled silver birches themselves turned out to have four sets of chromosomes.
"This illustrates that there has been and most likely still is some gene flow between the two species", stated researcher Jarkko Salojärvi from the Department of Biosciences.
In addition to population genomic analyses, the project assembled a reference genome for silver birch and predicted its genes.
"This hybrid assembly combined data from four different next generation sequencing platforms", stated researcher Olli-Pekka Smolander.
Population genomic analyses identified 900 genes under natural selection, which have evolved birch into its current state as a cold-tolerant and fast-growing pioneer species. Genes under selection are in key positions in the development of birch phenotypes, which is why breeding could focus on these key genes when developing new birch lines for biotechnology purposes.
"When the candidate genes have been identified, further breeding is rather rapid since birch is the only tree species that in special growth conditions can be made to flower within less than one year. This makes it possible to grow one breeding population in one year", stated Professor Jaakko Kangasjärvi.
"A unique trait in a single birch line can result from a mutation in a single gene, for example, the weeping birch cultivar known from gardens, Betula pendula "Youngii", had a truncated LAZY gene", stated Professor Yrjö Helariutta.
A mutation in this gene is known to produce a relaxed phenotype also in maize and thale cress.
The research was carried out by D.Sc Jarkko Salojärvi and Professor Jaakko Kangasjärvi, Dept. Biosciences, University of Helsinki, Finland; D.Sc Olli-Pekka Smolander and Petri Auvinen, Institute of Biotechnology, University of Helsinki , Finland; Professor Yrjö Helariutta, Dept. Biosciences and Institute of Biotechnology, University of Helsinki and Sainsbury laboratories, Cambridge, UK; and Professor Victor Albert, University at Buffalo, USA. The gene models were curated by researchers from University of Helsinki, University of Turku, University of Eastern Finland, Estonian University of Life Sciences, Umeå University, and the Natural Resources Institute of Finland.