13 agosto 2013

Y chromosome analysis moves Adam closer to Eve. Interview with Riccardo Berutti

Mitochondrial DNA and Human Evolution” was the title of a scientific article that appeared in the January 1, 1987 issue of Nature, authored by Rebecca Cann, Mark Stoneking, and Allan C. Wilson. These three scientists announced that all modern human beings can trace their ancestry back to a single woman (the so-called "Mitochondrial Eve"), who lived about 190.000 to 200.000 years ago in Africa.
Scientists estimated that "Y-Chromosomal Adam", the most recent common ancestor of men, lived much more recently, between 50,000 to 115,000 years ago.
The disparity between our most recent common ancestors (MRCA) might have resulted from limitations in gene sequencing: up until about five years ago, researchers could sequence only a few regions of the genome.
According to a couple of papers published on Science magazine the 2 August 2013 there is a good chance that Adam and Eve may have existed about the same time, evolutionarily speaking. Using the complete strand of DNA that determines male sex, researchers have determined that Y-Chromosome Adam lived 120.000 to 156.000 years ago, overlapping with Mitochondrial Eve, who lived 99.000 to 148.000 years ago.

Rebecca L. Cann (University of Hawaii at Manoa, Department of Cell and Molecular Biology) in her perspective for Science magazine (Y Weigh In Again on Modern Humans) wrote: «So now it seems that a population giving rise to the strictly maternal and strictly paternal portions of our genomes could have produced individuals who found each other in the same space and time».

Analyzing the Y chromosome sequences of 69 men from 9 populations, scientists from Stanford University and their colleagues have found that Y-Chromosomal Adam did not live much more recently than Mitochondrial Eve [Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females].

Likewise, by studying the genomes of 1.204 Sardinian men with a focus on the Y chromosome, a team based in Italy (University of Sassari, CNR, CRS4) calculated an age for the most recent common male ancestor that is consistent with previous estimates of the female ancestor based on mitochondrial DNA [Low-Pass DNA Sequencing of 1200 Sardinians Reconstructs European Y-Chromosome Phylogeny].

Riccardo Berutti, one of authors of the second article, explains in this interview the importance of the Y chromosome analysis, that moves "Adam" closer to "Eve".

What you were most surprised of this study?
«In the last years along with the other studies we carried on within the CRS4-CNR collaboration, we did a lot of work on the data for this paper, and personally I learned a lot of stuff that time-by-time I found surprising. There are three points that are worth to be told. One of them is about the population dynamics in Sardinia. You know that in our study we have a snapshot of the general population. Although given the dynamics of an isolated population like Sardinians it may appear as an obvious result, I found anyway surprising that the whole paternal lineage of an important part of the population of the I haplogroup, the one spotting private Sardinian variability, never moved away from Sardinia since it entered into the Neolithic, and so happened with other lines. We arrived, we loved this island and never leaved. Fantastic, we are still like that maybe. Secondly, I would point at the really surprising precision we achieved on the individual samples, which overtook by far the raw-data precision that we normally have on sequencing data. We could reach such an accuracy by applying and tuning appropriate filters both on raw data and on the obtained genotypes and using the extremely powerful phylogenetic criteria, which closed the games, leading almost to perfect results as compared with other intrinsically more accurate genotyping methods. Just to conclude this story, the third thing I really enjoyed is the extreme versatility of genetic data. All Sardinian DNA we used in our study belonged to two studies targeting to really different topics than anthropology and phylogenetic stuff, just like the data we used as outgroup (non Sardinians) that came mainly from the public archive of the Thousand Genomes project. This to say that such data represent a true gold mine for future studies and seen from this point of view, the huge costs for the production of this data instantly turn into a brilliant investment into the future of research».

What has surprised you most about this study?
«At least half of the samples where sequenced at CRS4, the other half at the "Sequencing Core" facility of the University of Michigan in Ann Arbor. With my colleagues at the AGCT (Advanced Genomics Computing Technology) group we took raw sequencing data and processed them up to obtaining aligned data. The alignment process consists in rebuilding the complete genome for each individual sample from short sequences which consist of 100 DNA bases each which should be identified and matched to their place into the reference genome, consisting of around 3 billion DNA bases. The same results were collected from Ann Arbor. After the alignment step, data has a huge error rate, in the range of  1-10% error, depending on the version of the chemistry used, of the sequencing machine, and on several other factors. We used sequences generated between 2008 and 2013 with huge differences in accuracy, while the latest have lower error rates (even less than 1%) they are just a small part of the whole cohort. All this means, one wrongly read DNA base every 100, or even one every 10. And for every error you can spot a new mutation which actually doesn't exist, or you can miss a mutation on the other side. All that is quite normal with Illumina data and it is fundamental, at this stage, to create and apply appropriate filtering, depending on the dataset and on what you want to get from it».

What was your role in this research?
«My personal role in all of that follows the raw data alignment, was firstly to quality check every single sample, then to figure out which filters may suit best to the Y chromosome and apply them. Such filters are usually different from the ones that are commonly applied to the other chromosomes, since Y is only carried in one single copy per male sample, which means that you have less statistics with respect to the other chromosomes, therefore you are more prone to errors. On the other side you will never have heterozygous genotypes (i.e. when you read at a certain position: A on the first copy of chromosome 1, and C on the second copy, then you have an A/C heterozygous base call) and this is a good way to filter. New software tools were necessary and I took care of developing them, both for quality control and for filtering on the Y chromosome. The downstream workflow was approximately as follows: run a quality control test, apply quality thresholds on the single samples, create statistical filtering on the whole panel. And it's somehow a recursive process, you design your filters, you see your results distribution and then tune the filter chain. All this stuff was done continuously working side by side with Paolo Francalacci, the first author, which worked on the phylogenetic method and whose feedback was important for the amelioration of filtering. It's worth to say that after all this work the precision of the final data supplied to Paolo increased up to a maximum error rate of 0.1%. After this processing Paolo applies the phylogenetic criteria to build the trees and to improve even more the dataset, as far as we could measure the results are nearly error-free. One other part of my contribution was the analysis which allowed us to infer the ancestral status of several alleles, that means, comparing human mutations with chimpanzee ones on the Y chromosome. This was particularly useful when establishing which mutations carried by the A haplogroup, which is the African and the most ancient one, are ancestral or A specific. In fact the reference genome has been built from modern humans of more recent haplogroups than A, thus leading to the fact that some of the reference genomes is mutated with respect to the "original" homo sapiens. If you do not take into account that you get inconsistent results».

Next step?
«We still have to go deep into the analysis of the mithocondrial DNA of the complementary of the source datasets we used: the females. The so called mtDNA is small fragment of genetic code, just 16.6 thousands bases long, really tiny compared to the 8 million bases of the Y chromosome (over a length of 50 millions) that we used for our study. This fragment of DNA is the 'source code' for the mithocondria, which are a sort of power plant for the cells. It's a good counterpart for the Y chromosome since it's maternally inherited, so that it can tell us the other side of the story. We can easily expect that while on the timing we can improve the previously estimates, on the demographic history it's not straightforward that we'll obtain the same tree as we did for males, and we could discover a partially different path, that,once compared to the Y chromosome history may improve our ability on doing population genetics using the other chromosomes. Other steps will include sequencing ancient bones from the various sites in Sardinia, such information might also be helpful to better understand the specific Sardinian variability leading to our peculiar phenotype and to the higher rate in Europe for some auto-immune diseases. Some more plans are being made to continue this collaboration with Francalacci and Cucca. They are really two visionary men and have fantastic ideas about the future of this research. In the meantime sequencing and computing technologies are being continuously enhanced and lot of new things will become possible. There's a really exciting path to dig into the secrets of life.»

Riccardo Berutti, 29, is nuclear physicist who moved his interests to genetics. He's currently taking his PhD in Genetics at the University of Sassari and was a member of the Advanced Genomics Computing Technology group at CRS4 working side by side with the specialists of the sequencing platform. Riccardo worked for long time as a web developer, graduated in 2007 with a thesis on the Muon Detector of the LHCb experiment at CERN and later had a one year experience at Akhela as an embedded software consultant for automotive applications for major Italian automakers.

Andrea Mameli www.linguaggiomacchina.it 13 August 2013

Nessun commento: