Showing posts with label Mutation Rates. Show all posts
Showing posts with label Mutation Rates. Show all posts

Tuesday, March 17, 2015

New NGS study of the Y DNA

A new Y-DNA study has appeared using Next Generation Sequencing, where ~9 Mb of the Y Chromosome was sequenced for 456 samples (299 of which were new) some preliminary observations are outlined below:

(1) Mutation Rate:

This is the second published study to calibrate the substitution mutation rate for the YDNA based on fossil evidence, to do this, they used a combination of derived mutation rates from 2 separate fossils; the 12.6 KY old Anzick fossil from Montana belonging to haplogroup Q1b and the 4 KY old Saqqaq fossil from Greenland belonging to haplogroup Q2b. The first study, Fu (2014) used the 45 KY old Ust-Ishim fossil from Siberia belonging to haplogroup K(xLT). Interestingly, despite the big difference in age of these fossils of ~ 36 KYA (on average), the derived mutation rates were quite close to each other, with the current study's central estimate only ~8% slower than the rates derived from the Ust-Ishim fossil. The 95% CI bounds for this study were however less tight than the 95% CI bounds of Fu (2014). I have already incorporated these new rates into the TMRCA calculator under Karmin (2015).

(2) Coalescence of Non-African YDNA chromosomes:

The authors report :
....... a cluster of major non-African founder haplogroups in a narrow time interval at 47–52 kya, consistent with a rapid initial colonization model of Eurasia and Oceania after the out-of-Africa bottleneck
Which aligns almost perfectly with the recent find in Manot, Israel of the 49.2 - 60.2 KY old non-African AMH fossil believed of being closely related to the ancestors of all extant non-Africans, i.e. the first OOA migrants.

(3) A "New" E1b1b (E-M215) topology:

The "new" topology of E-M215 they outline below is in-fact over 3 years old, actually, we knew more back then than what they show in this paper today (see here)
E-M215 Karmin (2015)
Compared with what we knew 3 years ago (note: CTS8288 above is equivalent to E-Z830 below):


The unanswered questions with respect to the major topology of E-M215 remain:
  • What is the relationship, if any,  of E-V92 with respect to E-Z827, E-Z830 or E-V68
  • What is the relationship, if any, of E-V6 with respect to E-Z827, E-Z830 or E-V68

A recent bottleneck of Y chromosome diversity coincides with a global change in culture
 

Abstract

It is commonly thought that human genetic diversity in non-African populations was shaped primarily by an out-of-Africa dispersal 50–100 thousand yr ago (kya). Here, we present a study of 456 geographically diverse high-coverage Y chromosome sequences, including 299 newly reported samples. Applying ancient DNA calibration, we date the Y-chromosomal most recent common ancestor (MRCA) in Africa at 254 (95% CI 192–307) kya and detect a cluster of major non-African founder haplogroups in a narrow time interval at 47–52 kya, consistent with a rapid initial colonization model of Eurasia and Oceania after the out-of-Africa bottleneck. In contrast to demographic reconstructions based on mtDNA, we infer a second strong bottleneck in Y-chromosome lineages dating to the last 10 ky. We hypothesize that this bottleneck is caused by cultural changes affecting variance of reproductive success among males.


Link (Closed Access)

Monday, December 8, 2014

SNP vs. STR YDNA TMRCA Estimation

An interesting comparison of YDNA TMRCA estimates using the SNP counting method and STRs (with both pedigree and Zhivotovsky rates as well as rho and ASD methods) can be found in a recently published study.


The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs covering the majority of known clades

Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51 ×, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analysing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of non-synonymous variants in 15 MSY single-copy genes. 

Link (Open Access)

(iii) The evolutionary STR mutation rate consistently overestimates, and the pedigree rate underestimates, the TMRCAs of nodes (Figure 4a).As expected, the pedigree mutation rate performs better for young nodes (<10 KYA; Table S6 ), while the evolutionary rate performs better for older nodes.

Off course "overestimation" and "underestimation"in this case are both relative to the particular mutation rate used by the authors for the SNP counting method in the first place, the authors used the Xue (2009) mutation rate estimate of 1 X 10^-9/bp/year , therefore, a slower mutation rate choice (like from Poznick (2013) or Francalacci (2013) for instance ) would obviously reduce the "overestimation" of the evolutionary STR mutation rate performance and conversely, a faster mutation rate choice would reduce the "underestimation" of the pedigree mutation rate performance, also important to note is that there is quite a bit of variance within the pedigree rates themselves, the authors chose to use a mean pedigree rate from YHRD (see the YTMRCA Calculator to see how pedigree rates from different sources impact TMRCA estimation). All in all however this was an interesting exercise, I hope we can get to see more of these types of comparisons, especially with fossil calibrated mutation rate estimates used for the SNP counting method.

Figure4: Relationship between SNP-and STR-based TMRCA estimates.SNP-based node estimates are plotted against   STR-based estimates for (a) 21 STRs (b) 17 STRs and (c) 13 STRs, here using ASD with the ‘ancestral haplotype’ root specification. The black dashed linein each case indicates x=y.U nderlying data and correlation coefficients are given in Tables S6 and S7.

UPDATE:
For further insight in the current understanding of substitution rates used for the SNP counting method, I direct readers to the Wang (2014) article which enumerates on the 4 primary methods that have been used to calculate the substitution rate:
  1. Human - Chimp Comparisons : Thompson (2000) , Kuroki (2006)
  2. Deep Rooting Pedigree: Xue (2009)
  3. Autosomal Mutation Rate Adjustment: Mendez (2013)
  4. Founding Migrations Based Inference:  Poznick (2013), Francalacci (2013)  
In terms of inferences based on the Y Chromosome TMRCA and the Out Of Africa migrations the authors suggest that Xue (2009) and Poznick (2013) give the most reasonable estimates. 

Comparison of different Y chromosomal substitution rates in time estimation using Y chromosome dataset of 1000 Genome dataset. Time estimations are performed in BEAST. (a) TMRCA of 526 Y chromosomes (including haplogroup A1b1b2b-M219 to T). (b) Time of Out-of-Africa migration, the age of macro-haplogroup CT. HCR- Thomson and HCR-Kuroki: Y chromosome base-substitution rate measured from human-chimpanzee comparison by Thomson et al. [6] and Kuroki et al. [7], respectively. Pedigree rate: Y chromosome base-substitution rate measured in a deep-rooting pedigree by Xue et al. [8]. Autosomal Rate Adjusted: Y chromosome substitution rate adjusted from autosomal mutation rates by Mendez et al. [9]. AEFM-America and AEFM-Sardinian: Y chromosome base-substitution rate based on archaeological evidence of founding migrations using initial peopling of Americas [10] and initial Sardinian expansion [11], respectively. Different reported mutation rates are given at the log scale. Confidence intervals for some of the mutation rates are very wide, and time calculations here use only the point estimate. The times would overlap more if all the uncertainties were taken into account. Figure was drawn using boxplot in R 3.0.2.

However a fifth method , entirely sequencing Y chromosomes from verifiable ancient individuals , a method which is still at its infancy but gaining momentum, should refine the substitution rate to a level of precision that as of yet has not been available. It stands to be seen if it will corroborate the rates from the front runners (Xue (2009), Poznick (2013) ) or maybe even yield unforeseen results.

Monday, June 9, 2014

Mutation rate of the nuclear genome is getting a fossil calibration

The SMBE 2014 conference is showcasing a presentation where a 45,000 year old genome is being fully sequenced by Fu et al. and where the sequence will be used to calibrate, to my knowledge for the first time, the mutation rate of the nuclear genome.

Previously, Fu et al. (2013) had calibrated the Mitochondrial genome's mutation rate by using some 10 non-African fossils as a reference, with results by and large in compliance with previously established mutation rate estimates.

The rate of accumulation of SNPs on the YDNA, will thus be the last remaining thing to get a fossil calibration. Once we get that, temporal based analysis using these calibrated mutation rates should gain a much more solid basis.

O-15
The complete genome sequence of a 45,000-year-oldmodern human from Eurasia
Qiaomei Fu 1 ,2, Bence Viola1 ,3, Heng Li5 ,6, Priya Moorjani6, Flora Jay4, Aximu Ayinuer-Petri1, Susan Keates8, Yaroslav V. Kuzmin7, Montgomery Slatkin4, David Reich5 ,6, Janet Kelso1, Svante Pääbo1  
1Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 2Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, Beijing, China, 3Department of Human Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 4Department of Integrative Biology, University of California, Berkeley, USA, 5Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA, 6Department of Genetics,Harvard Medical School, Boston, USA, 7Institute of Geology & Mineralogy, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia, 8University Village, Columbia, USA

We have sequenced to high coverage the genome of a femur recently discovered near Ust-Ishim in Siberia. The bone was directly carbon-dated to 45,000 years before present. Analyses of the relationship of the Ust-Ishim individual to present-day humans show that he is closely related to the ancestral population shared between present-day Europeans and present-day Asians. The over-all amount of genomic admixture from Neandertals is similar to that in present-day non-Africans and there is no evidence for admixture from Denisovans. However, the size of the genomic segments of Neandertal ancestry in the Ust-Ishim individual is substantially larger than in present-day individuals. From the size distribution of these segments we estimated that this individual lived about 200-400 generations after the admixture with Neandertals occurred. The age of this genome allows us to directly assess the mutation rate in the different compartments of the human genome. These results will be presented and discussed.

Link

Tuesday, May 7, 2013

Analyzing YDNA A-M13 lineages in Ethiopian linguistic groups

Similar to the previous analysis of J lineages found in Ethiopia from the Plaster paper, the other prevalent lineage in Ethiopia, A-M13 (formerly known also as A3b2), is also analyzed below. A total of 616 A-M13 lineages were reported in the study, of which ~32% were classified as Semitic speakers, ~40% as Cushitic speakers, ~17% as Omotic speakers and the remainder within the Nilo-Saharan speaking macro-phylum.

Wednesday, May 1, 2013

Analyzing YDNA J lineages in Ethiopian linguistic groups

The extensive YDNA dataset found in the Plaster paper has a total of 691 YDNA lineages that belong to haplogroup J, although there is no more detailed SNP resolution reported for most of these lineages, it is safe to assume, from previous data on Ethiopia, that a vast majority of them would belong to J1-M267. There is a limited set of STR data that accompanies these lineages as well, namely only for the markers; 19, 388, 390, 391, 392 and 393.

According to the report, J lineages are proportionally found higher in Semitic speakers in Ethiopia, ~21% ,followed by Omotic speakers at ~ 12% and Cushitic speakers at ~  8%.  Out of the 691 YDNA J lineages reported, 259 were Semitic speakers, 266 spoke some type of Omotic language and most of the remainder spoke Cushitic languages.

Thursday, February 21, 2013

The Zhivotovsky Multiplier


It is reported that Zhivotovsky's effective mutation rate [1] has the effect of increasing the TMRCA of a lineage, as computed by the use of Microsattelite Genetic Distances[2], by a factor of 3-4 fold relative to TMRCAs computed via mutation rates observed in pedigree and family studies [3].

By utilizing my TMRCA calculating program, I want to explore,
  1. What effect does different marker combinations have on this multiplier ?
  2. What effect does marker size have on this multiplier ?
  3. Is there a variation in this multiplier for different data-sets?

First, to ensure that my program correctly calculates the TMRCA when the Zhivotovsky mutation rate of 0.00069 is applied to all the markers in my database consistently (versus only the marker specific Pedigree mutation rates I have thus far been utilizing), I attempted to replicate the TMRCA computations of the following publication;


Saturday, January 5, 2013

TMRCA calculations from Plaster NRY data : Correcting an Error


Previously, I had computed TMRCAs for the YDNA STR data from the additional material that was provided along with Dr.Chris Plaster's thesis. However, after a brief communication with the author, I found out that the marker order of the STRs in the excel file was reported wrongly, the correct order for the markers are thus as follows:

DYS19 DYS388 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439 DYS448 DYS456 DYS635 Y GATA H4

This changes my TMRCA calculations because I am not computing the coalescent using a generic mutation rate that is equivalent for all the markers, but rather each marker has its own mutation rate attributed to it.

When I rerun my program using the newly corrected order above I get the following:


As can be seen, using the new order of markers generally reduces the number of generations to coalescent for the Plaster data-set. The previous observation of a relatively lower TMRCA for the haplozone data of E-M123 versus that of the E-M34 Plaster data-set largely disappears. 

To check if the fact that the high number of samples (129) present in the E-M123 haplozone data-set was skewing the results, I took 23 random samples (which equals the same number of samples available in the Plaster E-M34 data-set) from the larger E-M123 Haplozone dataset and re-run the TMRCA calculations on just those samples, I repeated this process 300 times, only 28% of the runs yielded a mean TMRCA less than the E-M34 Plaster data-set, if sample size was skewing the results I would expect >50% of the runs to have a mean TMRCA less than that of the E-M34 plaster dataset.

That said, the E-M34 Plaster data-set still had a relatively higher generations to coalescent than the E-M84 Haplozone dataset, E-M84 is a subclade of E-M34 and a high majority of haplotypes that belong to E-M34 also test positive for the E-M84 SNP (at least for the non-African E-M34 haplotypes that we know of).

Other than that, the new, and corrected, ordering of the markers did not have much impact in relative TMRCA terms between the Plaster and Haplozone/FTDNA data for the other lineages I had tested.

Tuesday, June 19, 2012

Finding the TMRCA of Ethiopian YDNA lineages using an ASD method.


I have been lately working on computing TMRCAs using an ASD or average square difference method on publicly available Y-STR haplotypes. The premise for finding the TMRCA using the ASD method is quite straight forward and easy to understand, a putative ancestral haplotype is calculated for a given dataset and the repeat of each sample at each marker in the dataset is subtracted from this ancestral haplotype, this result is then cumulated and divided by the number of samples and the marker specific mutation rate, the process is repeated for every single marker in the dataset and the mean is then multiplied by an assumed years per generation length, the formula below articulates this method:
TMRCA formula (ASD method)
 
Where;
N= Total number of Samples
Z= Total number of Markers
L0= Putative Ancestral Haplotype (Median or Modal repeats)
L= Individual sample haplotype repeats
m= Marker Specific Mutation Rate
G= Years / Generation

The biggest variable here, other than the sampling strategy of a given dataset, are the several marker specific mutation rates that are available. The process of selection of a correct mutation rate is an unsettled issue, I have therefore utilized 4 sets of mutation rates that were compiled by Paul Newlin, a collaborator at the E3b Project, these rates come from several different publications and you can read about them here for more detail:
 
  1. The Chandler Mutation Rates:
  2. Stafford Bayesian Mutation Rates:
    Essentially a compilation of other mutation rates
  3. Burgarella & Navascués Mutation Rates:
  4. Ballantyne Mutation Rates:
In order to have an analogously accurate comparison of the TMRCAs between the different publications, I had to weed out and intersect the available markers from above with markers that are found in the public domain. This essentially left me with the following 46 markers that intersected with all 4 of the above sets of rates as well as the 66 markers that are widely used:
406s1 , 19 , 388 , 389-1 , 389-2 , 390 , 391 , 392 , 393 , 426 , 436 , 437 , 438 , 439 , 442 , 444 , 446 , 447 , 448 , 450 , 454 , 455 , 456 , 458 , 460 , 472 , 481 , 487 , 490 , 492 , 511 , 520 , 531 , 534 , 537 , 557 , 565 , 568 , 572 , 578 , 590 , 594 , 617 , 640 , 641 and gatah4.

In addition, since the Chandler mutation rates had a complete intersection with the 66 widely used markers, an additional 66 marker Chandler set was independently used that included the following markers in addition to the 46 listed above:
385a , 385b , 459a , 459b , 449 , 464a , 464b , 464c , 464d , ycaiia , ycaiib , 607 , 576 , 570 , cdya , cdyb , 395s1a , 395s1b , 413a and 413b.
  
Haplogroups A, E and J, cover well over 90% of the YDNA lineages found in Ethiopia. More specifically within these haplogroups, I was more interested in finding the TMRCA for A-M13, E-M35 and J1-M267, as these lineages cover over 70% but under 80% of said lineages, whereas the remaining 20-30% of lineages found in Ethiopia belong to E1b1*(x E1b1b,E1b1a1), other types of E lineages like E2 and E*, and some specific clades that belong to haplogroups B,T and J2.