Showing posts with label Y SNP. Show all posts
Showing posts with label Y SNP. Show all posts

Tuesday, March 17, 2015

New NGS study of the Y DNA

A new Y-DNA study has appeared using Next Generation Sequencing, where ~9 Mb of the Y Chromosome was sequenced for 456 samples (299 of which were new) some preliminary observations are outlined below:

(1) Mutation Rate:

This is the second published study to calibrate the substitution mutation rate for the YDNA based on fossil evidence, to do this, they used a combination of derived mutation rates from 2 separate fossils; the 12.6 KY old Anzick fossil from Montana belonging to haplogroup Q1b and the 4 KY old Saqqaq fossil from Greenland belonging to haplogroup Q2b. The first study, Fu (2014) used the 45 KY old Ust-Ishim fossil from Siberia belonging to haplogroup K(xLT). Interestingly, despite the big difference in age of these fossils of ~ 36 KYA (on average), the derived mutation rates were quite close to each other, with the current study's central estimate only ~8% slower than the rates derived from the Ust-Ishim fossil. The 95% CI bounds for this study were however less tight than the 95% CI bounds of Fu (2014). I have already incorporated these new rates into the TMRCA calculator under Karmin (2015).

(2) Coalescence of Non-African YDNA chromosomes:

The authors report :
....... a cluster of major non-African founder haplogroups in a narrow time interval at 47–52 kya, consistent with a rapid initial colonization model of Eurasia and Oceania after the out-of-Africa bottleneck
Which aligns almost perfectly with the recent find in Manot, Israel of the 49.2 - 60.2 KY old non-African AMH fossil believed of being closely related to the ancestors of all extant non-Africans, i.e. the first OOA migrants.

(3) A "New" E1b1b (E-M215) topology:

The "new" topology of E-M215 they outline below is in-fact over 3 years old, actually, we knew more back then than what they show in this paper today (see here)
E-M215 Karmin (2015)
Compared with what we knew 3 years ago (note: CTS8288 above is equivalent to E-Z830 below):


The unanswered questions with respect to the major topology of E-M215 remain:
  • What is the relationship, if any,  of E-V92 with respect to E-Z827, E-Z830 or E-V68
  • What is the relationship, if any, of E-V6 with respect to E-Z827, E-Z830 or E-V68

A recent bottleneck of Y chromosome diversity coincides with a global change in culture
 

Abstract

It is commonly thought that human genetic diversity in non-African populations was shaped primarily by an out-of-Africa dispersal 50–100 thousand yr ago (kya). Here, we present a study of 456 geographically diverse high-coverage Y chromosome sequences, including 299 newly reported samples. Applying ancient DNA calibration, we date the Y-chromosomal most recent common ancestor (MRCA) in Africa at 254 (95% CI 192–307) kya and detect a cluster of major non-African founder haplogroups in a narrow time interval at 47–52 kya, consistent with a rapid initial colonization model of Eurasia and Oceania after the out-of-Africa bottleneck. In contrast to demographic reconstructions based on mtDNA, we infer a second strong bottleneck in Y-chromosome lineages dating to the last 10 ky. We hypothesize that this bottleneck is caused by cultural changes affecting variance of reproductive success among males.


Link (Closed Access)

Tuesday, February 3, 2015

SNP based module added to the Y TMRCA calculator

The solely STR based Y TMRCA calculator now also can accept SNP based input to compute the TMRCA of a node. Instructions and methodology can be found within the app at the link below:
https://ehelix.pythonanywhere.com/init/default/index

For now, it uses 7 separate mutation rates that all come from different publications, but not all necessarily using differing methods to derive the rates. I will look to expand these as more substitution mutation rates become available.

Below I have run some quick verifications for 3 separate mutation rate sources:

Poznick (2013) rates via Underhill (2014)

The following is stated in Underhill (2014):
A consensus has not yet been reached on the rate at which Y-chromosome SNPs accumulate within this 9.99Mb sequence. Recent estimates include one SNP per: ~100 years,⁵⁸ 122 years,⁴ 151 years⁵ (deep sequencing reanalysis rate), and 162 years.⁵⁹ Using a rate of one SNP per 122 years, and based on an average branch length of 206 SNPs from the common ancestor of the 13 sequences, we estimate the bifurcation of R1 into R1a and R1b to have occurred ~25,100 ago (95% CI: 21,300–29,000). Using the 8 R1a lineages, with an average length of 48 SNPs accumulated since the common ancestor, we estimate the splintering of R1a-M417 to have occurred rather recently, B5800 years ago (95% CI: 4800–6800). The slowest mutation rate estimate would inflate these time estimates by one third, and the fastest would deflate them by 17%.
Putting in the variables for the R1 node from above into the calculator,
We get an output of:

 R1 - Underhill (2014)
which for the mutation rate they used , i.e. Poznick (2013), the calculator gives 25.15 KYA, close enough to their estimate of 25.1 KYA.
Similarliy for the R1a-M417 node , we get:

R1a-M417 - Underhill (2014)
Again, looking @ the calculator's Poznick TMRCA of 5.86 KYA, we can see it is close enough to their estimate of 5.8 KYA.

Monday, December 8, 2014

SNP vs. STR YDNA TMRCA Estimation

An interesting comparison of YDNA TMRCA estimates using the SNP counting method and STRs (with both pedigree and Zhivotovsky rates as well as rho and ASD methods) can be found in a recently published study.


The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs covering the majority of known clades

Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51 ×, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analysing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of non-synonymous variants in 15 MSY single-copy genes. 

Link (Open Access)

(iii) The evolutionary STR mutation rate consistently overestimates, and the pedigree rate underestimates, the TMRCAs of nodes (Figure 4a).As expected, the pedigree mutation rate performs better for young nodes (<10 KYA; Table S6 ), while the evolutionary rate performs better for older nodes.

Off course "overestimation" and "underestimation"in this case are both relative to the particular mutation rate used by the authors for the SNP counting method in the first place, the authors used the Xue (2009) mutation rate estimate of 1 X 10^-9/bp/year , therefore, a slower mutation rate choice (like from Poznick (2013) or Francalacci (2013) for instance ) would obviously reduce the "overestimation" of the evolutionary STR mutation rate performance and conversely, a faster mutation rate choice would reduce the "underestimation" of the pedigree mutation rate performance, also important to note is that there is quite a bit of variance within the pedigree rates themselves, the authors chose to use a mean pedigree rate from YHRD (see the YTMRCA Calculator to see how pedigree rates from different sources impact TMRCA estimation). All in all however this was an interesting exercise, I hope we can get to see more of these types of comparisons, especially with fossil calibrated mutation rate estimates used for the SNP counting method.

Figure4: Relationship between SNP-and STR-based TMRCA estimates.SNP-based node estimates are plotted against   STR-based estimates for (a) 21 STRs (b) 17 STRs and (c) 13 STRs, here using ASD with the ‘ancestral haplotype’ root specification. The black dashed linein each case indicates x=y.U nderlying data and correlation coefficients are given in Tables S6 and S7.

UPDATE:
For further insight in the current understanding of substitution rates used for the SNP counting method, I direct readers to the Wang (2014) article which enumerates on the 4 primary methods that have been used to calculate the substitution rate:
  1. Human - Chimp Comparisons : Thompson (2000) , Kuroki (2006)
  2. Deep Rooting Pedigree: Xue (2009)
  3. Autosomal Mutation Rate Adjustment: Mendez (2013)
  4. Founding Migrations Based Inference:  Poznick (2013), Francalacci (2013)  
In terms of inferences based on the Y Chromosome TMRCA and the Out Of Africa migrations the authors suggest that Xue (2009) and Poznick (2013) give the most reasonable estimates. 

Comparison of different Y chromosomal substitution rates in time estimation using Y chromosome dataset of 1000 Genome dataset. Time estimations are performed in BEAST. (a) TMRCA of 526 Y chromosomes (including haplogroup A1b1b2b-M219 to T). (b) Time of Out-of-Africa migration, the age of macro-haplogroup CT. HCR- Thomson and HCR-Kuroki: Y chromosome base-substitution rate measured from human-chimpanzee comparison by Thomson et al. [6] and Kuroki et al. [7], respectively. Pedigree rate: Y chromosome base-substitution rate measured in a deep-rooting pedigree by Xue et al. [8]. Autosomal Rate Adjusted: Y chromosome substitution rate adjusted from autosomal mutation rates by Mendez et al. [9]. AEFM-America and AEFM-Sardinian: Y chromosome base-substitution rate based on archaeological evidence of founding migrations using initial peopling of Americas [10] and initial Sardinian expansion [11], respectively. Different reported mutation rates are given at the log scale. Confidence intervals for some of the mutation rates are very wide, and time calculations here use only the point estimate. The times would overlap more if all the uncertainties were taken into account. Figure was drawn using boxplot in R 3.0.2.

However a fifth method , entirely sequencing Y chromosomes from verifiable ancient individuals , a method which is still at its infancy but gaining momentum, should refine the substitution rate to a level of precision that as of yet has not been available. It stands to be seen if it will corroborate the rates from the front runners (Xue (2009), Poznick (2013) ) or maybe even yield unforeseen results.