Showing posts with label YDNA. Show all posts
Showing posts with label YDNA. Show all posts

Wednesday, June 24, 2015

Improved resolution of E-M215 (aka E3b / E1b1b)

A new paper has appeared with a a focus on Haplogroup E, and mostly focused on E-M215 and E-M35, with a moderate level of improvement in resolution from what we used to know.

Basically, at first glance, the major novelty with respect to E-M215 is that all E-Z830 (x M123) lineages are united under a new mutation dubbed V1515, and that the former solo lineages of E-M35, i.e. E-V92 and E-V6, now have a home and are included within this unification. In addition, the above named unifying mutation, V1515, apparently has a bifurcated structure itself, with one younger branch having the sole representation in the Southern parts of Ethiopia and further South, and the more diverse (hence ancient) branch being represented in the Northern parts of Ethiopia and further North.

New basal haplogroup E mutations were also apparently found.

The paper is Open access , and I will analyze it further in the coming days , but I just wanted to plot the Eastern African E-M215 variant frequencies for now.

 
UPDATE (6/26/15) - Added NAfrica E-M215 frequencies
UPDATE (6/26/15) - Added new mutation rate
The new fossil calibrated mutation rate has been added to the TMRCA Calulator, unfortunately 95% CI values have not been given (or at least I could not find where they have been given), in any event, central TMRCA estimates for this new mutation rate are a bit slower than mutation rates derived from the other ancient DNA calibrated sources, specifically,  ~ 4%  and 12% slower than Karmin (2015) and Fu (2014) respectively.

UPDATE (6/27/15) - Comparison with YFull TMRCAs
I have created a table for the TMRCA of the major nodes in E-M215, in order to compare with YFull’s estimates so that we can ‘fill in the gaps’ for the Nodes that have not been given estimates in Trombetta (2015). YFull uses a mutation rate that is almost exactly identical to Fu (2014)’s  Ust-Ishim calibrated rates, so naturally some of the TMRCA’s would be closer to today than the Trombetta estimates, as pointed out above.




TMRCA (KYA)
Trombetta
YFull
E-M215
39
35
E-M35
25
24
E-V68
20
20
E-M78
15
13
E-Z827
?
24
E-V257
?
14
E-Z830
20
19
E-M34
?
15
E-V1515
12
?

Saturday, June 6, 2015

More Ethiopian Uniparental Data (More resolution.. less clarity)

A new paper attempting to decipher the out of Africa exit route by focusing on Ethiopian and Egyptian autosomal genetics was published a couple of weeks ago. Putting aside the 'hocus pocus' autosomal analysis for a moment, I was quite intrigued by the more concrete uniparental relative frequency images published in the supplemental material, not a lot of clarity is attached with these images however as the actual numbers are not given.


Note that the phylogeny they reference for the results here, is from Phylotree Y.

Below I have attempted to interpret some of the colors from the image into Numerical approximations, note these are only approximations and not a substitute for the real data, of which I am not privy to.


Amhara Eth Somali Gumuz Oromo Wolayta
A-M13 27% 0% 55% 19% 48%
B-M150 0% 0% 4% 0% 0%
B-M8495 0% 0% 35% 0% 0%
E-M96 3% 4% 0% 6% 12%
E-M215 3% 0% 0% 0% 0%
E-V22 9% 0% 0% 5% 3%
E-Z1902 8% 80% 4% 20% 0%
E-Z830 0% 0% 0% 0% 3%
E-M34 3% 0% 0% 5% 13%
EM4145 17% 0% 0% 25% 20%
J 25% 11% 0% 19% 0%
T 3% 4% 0% 0% 0%

A-M13 :

The prevalence of this haplogroup in Ethiopia has always been known to us, however the extremely high frequency in the Wolayta is quite a surprise, this could be due to the relatively small sample size however, as the much higher sample size of the Wolayta found in the Plaster thesis, only showed 13% of A-M13.

B-M150 and  B-M8495 :

Only found in the Gumuz, we have known for a while that B is not prevalent at all in the wider Ethiopian population, rather it is a continuation of the much larger B frequencies found in Niloitic Sudan. Still, it is good to see a finer resolution of B, and that the majority of B clades in Ethiopia belong to the small B-M8495 branch.

E-M96:

This could potentially be a wide variety of things, but my money would be on E-M329, sister clade to E-M2 and  child clade of E-V38, which in turn is a sister clade to E-M215, the most prevalent YDNA lineage in Ethiopia.

E-M215

As this is showing only in Northern Ethiopia, I would think it maybe E-V92, it still could however be a basal "E3b" lineage.

E-V22

A variant of E-M78, this lineage has always been found in low amounts in Ethiopia, with moderate amounts in Sudan and Egypt.

E-Z1902

This is a lineage that is found downstream of E-M78, but unites E-V12 with E-V65, which means the results would include E-V32 , a sublineage of E-V12 and the most frequent YDNA lineage in Somalis, I would wager that all of the E-Z1902 is actually E-V32, since E-V65 has never been found in Ethiopia thus far. There is a chance that some E-V12* could be in the mix as well.

E-Z830

This lineage has been discussed before, it unites many lineages in Ethiopia, including E-M34,E-M293 and E-V42. It looks like they did not test for E-V42 from the image however, so it could be E-V42.

E-M34

The prevalence of this lineage in southern Ethiopia from the image above, could be further confirmation of the high frequency of E-M34 found in the omotic speaking Maale from the plaster thesis.

EM4145

This is a tricky one, I am not sure what it is , I have searched for SNPs named as such and came back empty handed, to complicate things further, it is shaded a similar color as E-M293, but I discounted that lineage based on the fact that the lineage they report here is found in relatively high frequency in Ethiopia, whereas previous data shows that E-M293 is only found in low to moderate  frequencies in Ethiopia. My best guess for this SNP would be something equivalent to E-V6, if not that then E-P2(x E-M215), but with less confidence for the latter, as if that was the case, I would think they would have given it a more basal presence in the hierarchy of YDNA lineages from the image above.

J and T

These F belonging lineages look both to be inline with what we already know in terms of frequency distribution throughout Ethiopia.

refs:
http://ethiohelix.blogspot.com/2010_12_01_archive.html
http://ethiohelix.blogspot.com/2012/01/e1b1b-update.html 
http://ethiohelix.blogspot.com/2012/11/extensive-doctoral-thesis-on-ethiopian.html
http://ethiohelix.blogspot.com/2013/05/another-extensive-thesis-on-east.html

Update 06/07/2015 - MTDNA

Monday, December 8, 2014

SNP vs. STR YDNA TMRCA Estimation

An interesting comparison of YDNA TMRCA estimates using the SNP counting method and STRs (with both pedigree and Zhivotovsky rates as well as rho and ASD methods) can be found in a recently published study.


The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs covering the majority of known clades

Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51 ×, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analysing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of non-synonymous variants in 15 MSY single-copy genes. 

Link (Open Access)

(iii) The evolutionary STR mutation rate consistently overestimates, and the pedigree rate underestimates, the TMRCAs of nodes (Figure 4a).As expected, the pedigree mutation rate performs better for young nodes (<10 KYA; Table S6 ), while the evolutionary rate performs better for older nodes.

Off course "overestimation" and "underestimation"in this case are both relative to the particular mutation rate used by the authors for the SNP counting method in the first place, the authors used the Xue (2009) mutation rate estimate of 1 X 10^-9/bp/year , therefore, a slower mutation rate choice (like from Poznick (2013) or Francalacci (2013) for instance ) would obviously reduce the "overestimation" of the evolutionary STR mutation rate performance and conversely, a faster mutation rate choice would reduce the "underestimation" of the pedigree mutation rate performance, also important to note is that there is quite a bit of variance within the pedigree rates themselves, the authors chose to use a mean pedigree rate from YHRD (see the YTMRCA Calculator to see how pedigree rates from different sources impact TMRCA estimation). All in all however this was an interesting exercise, I hope we can get to see more of these types of comparisons, especially with fossil calibrated mutation rate estimates used for the SNP counting method.

Figure4: Relationship between SNP-and STR-based TMRCA estimates.SNP-based node estimates are plotted against   STR-based estimates for (a) 21 STRs (b) 17 STRs and (c) 13 STRs, here using ASD with the ‘ancestral haplotype’ root specification. The black dashed linein each case indicates x=y.U nderlying data and correlation coefficients are given in Tables S6 and S7.

UPDATE:
For further insight in the current understanding of substitution rates used for the SNP counting method, I direct readers to the Wang (2014) article which enumerates on the 4 primary methods that have been used to calculate the substitution rate:
  1. Human - Chimp Comparisons : Thompson (2000) , Kuroki (2006)
  2. Deep Rooting Pedigree: Xue (2009)
  3. Autosomal Mutation Rate Adjustment: Mendez (2013)
  4. Founding Migrations Based Inference:  Poznick (2013), Francalacci (2013)  
In terms of inferences based on the Y Chromosome TMRCA and the Out Of Africa migrations the authors suggest that Xue (2009) and Poznick (2013) give the most reasonable estimates. 

Comparison of different Y chromosomal substitution rates in time estimation using Y chromosome dataset of 1000 Genome dataset. Time estimations are performed in BEAST. (a) TMRCA of 526 Y chromosomes (including haplogroup A1b1b2b-M219 to T). (b) Time of Out-of-Africa migration, the age of macro-haplogroup CT. HCR- Thomson and HCR-Kuroki: Y chromosome base-substitution rate measured from human-chimpanzee comparison by Thomson et al. [6] and Kuroki et al. [7], respectively. Pedigree rate: Y chromosome base-substitution rate measured in a deep-rooting pedigree by Xue et al. [8]. Autosomal Rate Adjusted: Y chromosome substitution rate adjusted from autosomal mutation rates by Mendez et al. [9]. AEFM-America and AEFM-Sardinian: Y chromosome base-substitution rate based on archaeological evidence of founding migrations using initial peopling of Americas [10] and initial Sardinian expansion [11], respectively. Different reported mutation rates are given at the log scale. Confidence intervals for some of the mutation rates are very wide, and time calculations here use only the point estimate. The times would overlap more if all the uncertainties were taken into account. Figure was drawn using boxplot in R 3.0.2.

However a fifth method , entirely sequencing Y chromosomes from verifiable ancient individuals , a method which is still at its infancy but gaining momentum, should refine the substitution rate to a level of precision that as of yet has not been available. It stands to be seen if it will corroborate the rates from the front runners (Xue (2009), Poznick (2013) ) or maybe even yield unforeseen results.

Friday, April 4, 2014

Median Joining Networks

This post will be dedicated to YSTR median joining networks I will be creating using the Fluxus Network Software©

The  Y TMRCA calculator now also has the capability to create the input file necessary to create median joining networks using the Fluxus Network Software, see here for more details.

This blog-post will be updated routinely with network diagrams as I make more of them, since I do not have access to the Network Publisher  add-on, it takes a considerable amount of time to properly format the diagrams.

I will start with Ethiopian E-M34 data from the Plaster thesis, see also the following post for more detail on Ethiopian E-M34: YDNA E-M123; A closer look