Showing posts with label Haplogroup E. Show all posts
Showing posts with label Haplogroup E. Show all posts

Wednesday, June 24, 2015

Improved resolution of E-M215 (aka E3b / E1b1b)

A new paper has appeared with a a focus on Haplogroup E, and mostly focused on E-M215 and E-M35, with a moderate level of improvement in resolution from what we used to know.

Basically, at first glance, the major novelty with respect to E-M215 is that all E-Z830 (x M123) lineages are united under a new mutation dubbed V1515, and that the former solo lineages of E-M35, i.e. E-V92 and E-V6, now have a home and are included within this unification. In addition, the above named unifying mutation, V1515, apparently has a bifurcated structure itself, with one younger branch having the sole representation in the Southern parts of Ethiopia and further South, and the more diverse (hence ancient) branch being represented in the Northern parts of Ethiopia and further North.

New basal haplogroup E mutations were also apparently found.

The paper is Open access , and I will analyze it further in the coming days , but I just wanted to plot the Eastern African E-M215 variant frequencies for now.

 
UPDATE (6/26/15) - Added NAfrica E-M215 frequencies
UPDATE (6/26/15) - Added new mutation rate
The new fossil calibrated mutation rate has been added to the TMRCA Calulator, unfortunately 95% CI values have not been given (or at least I could not find where they have been given), in any event, central TMRCA estimates for this new mutation rate are a bit slower than mutation rates derived from the other ancient DNA calibrated sources, specifically,  ~ 4%  and 12% slower than Karmin (2015) and Fu (2014) respectively.

UPDATE (6/27/15) - Comparison with YFull TMRCAs
I have created a table for the TMRCA of the major nodes in E-M215, in order to compare with YFull’s estimates so that we can ‘fill in the gaps’ for the Nodes that have not been given estimates in Trombetta (2015). YFull uses a mutation rate that is almost exactly identical to Fu (2014)’s  Ust-Ishim calibrated rates, so naturally some of the TMRCA’s would be closer to today than the Trombetta estimates, as pointed out above.




TMRCA (KYA)
Trombetta
YFull
E-M215
39
35
E-M35
25
24
E-V68
20
20
E-M78
15
13
E-Z827
?
24
E-V257
?
14
E-Z830
20
19
E-M34
?
15
E-V1515
12
?

Friday, February 21, 2014

YDNA E-M123; A closer look

E-M123 (as well as E-M34) was first discovered by Underhill(2000) and is found with a low to medium frequency distribution in East Africa and the Middle East, while it has a low frequency distribution in North Africa and Europe.

Phylogeny:
Figure 1 - Current and previous E-M215 phylogenetic structure 

Figure 1 shows a comparison of the basic phylogeny of E-M215/M35 as was known before 2011 (a) and after (b), with a 'who and when' key for the Discovery of the UEPs. Notice the impact the rearrangement has on the phylogenetic placement of E-M123, specifically the fact that E-M123 is shown to have a more recent common ancestor with the East and Southern African variants of E-M35, i.e. E-V42 and E-M293, before it does with any of the other variants of E-M35.

Previous publications:

While it is unfortunate that all of the research that has previously been published on E-M123 was done under the consideration of the older (and rather out of date) configuration of the basic structure of E-M35, it is still worth while to look at articles that have tried to untangle the origins and history of this lineage, of these, 3 come to mind:

Friday, February 14, 2014

Comprehensive Ethiopian YDNA TMRCA Estimates

Find below a comprehensive list for all central TMRCA estimates calculated from the Plaster thesis for 6 UEPs (look at this post under Interactive Chart of Figure 3.2 for the frequencies of the UEPs). P*(x R1a) & Y*(x BT,A3b2)  are not included due to their minimal frequency and very sporadic distribution. 

There were a total of 5,756 haplotypes reported with the paper for the markers DYS19, DYS388, DYS390, DYS391, DYS392 and DYS393.  30 of those haplotypes belonged to P*(x R1a) & Y*(x BT,A3b2), leaving a total of 5,726 haplotypes. These remaining haplotypes, were then categorized with the criteria of Cultural ID + Generic Language Group* + UEP, any group of haplotypes that conformed to this criteria with N >1 and with a coalescent not equal to 0 (meaning non-identical haplotypes) were processed for their TMRCA and reported, accounting for 5,668 or 98% of the total haplotypes reported for the paper.

The tables are ordered according to the frequencies of the tested UEPs in Ethiopia, i.e. E*(x E1b1a), 3985 Haplotypes  > J,  689 Haplotypes  > A3b2, 601 Haplotypes  > K*(xL,N1c,O2b,P) , 154 Haplotypes > BT*(xDE,JT), 193 Haplotypes  and E1b1a7, 46 Haplotypes .

Note that both the mean TMRCA's for Zhivotovsky (Z-TMRCA) and the pedigree rates (P-TMRCA), some times also known as germline rates, are in units of generations, the suitable length of a generation for the Z-TMRCA is 25 years, while for the P-TMRCA it may range from 28 to 33 years.

If detail of the TMRCA analysis for any of the populations listed below maybe required, go to the table here, and upload the necessary file into the Y TMRCA calculator and filter for the specific population in question.

Tuesday, October 22, 2013

New paper sheds light on the F-series YDNA SNPs

The F-series YDNA SNPs appeared at the end of last year with results from Geno 2.0, now an electronic pre-print at arXiv.org sheds some light on the discovery of these SNPs.

The paper, entitled :  Y Chromosomes of 40% Chinese Are Descendants of Three Neolithic Super-grandfathers, is freely available for download.

Some interesting (relevant to this blog) quotations from the paper follows (in blue) :

To identify major population expansions related to male lineages, we sequenced 78 East Asian Y chromosomes at 3.9 Mbp of the non-recombining region (NRY), discovered >4,000 new SNPs, and identified many new clades.

Nearly all the Y chromosomes outside Africa are derivative at the SNP M168 and belong to any of its three descendent super-haplogroups – DE, C, and F 9,10,15, strongly supporting the out-of-Africa theory. The time of the anatomically modern human’s exodus from Africa has yielded inconsistent results ranging from 39 kya 16, 44 kya 10, 59 kya 17, 68.5 kya 18 to 57.0 – 74.6 kya 19.


This below explains why the F-series SNPs are for the most part found below CT-M168.

we selected 110 males, encompassing the haplogroups O, C, D, N, and Q which are common in East Eurasians, as well as haplogroups J, G, and R which are common in West Eurasians (see Table S1), and sequenced their non-repetitive segments of NRY using a pooling-and-capturing strategy.


Overall ~4,500 base substitutions were identified in all the samples from the whole Y chromosome, in which >4,300 SNPs that has not been publicly named before 2012 (ISOGG etc.). We designated each of these SNP a name beginning with ‘F’ (for Fudan University) (see Table S2). We obtained ~3.90 Mbp of sequences with appropriate quality (at least 1x coverage on >100 out of 110 samples), and identified ~3,600 SNPs in this region.


Table S2 is not available in the PDF file, the link says that all the tables are in a 'separate ancillary file', but such file is also not available, at least not at the time of the publishing of this post, and may become available when the paper is officially published. With out seeing the actual location on the Y chromosome where these SNPs are found it is hard to say how many of them are redundant SNPs relative to the PF and CTS SNPs, and how many of them are truly 'novel'.

Considering that 3.9 Mbp range constitutes only less than half of 10 Mbp non-repetitive region in Y chromosome 7, the time resolution of east Asian Y chromosome phylogeny is expected to be doubled in the near future.


To overcome the factors for uncertainty of mutation rate, a calibration with series of samples of comparable time scales might be used. For the case of mitochondrion, a recent study, in which several C-14 calibrated ancient complete sequences (4 – 40 kya) were incorporated into the tree, made the absolute dates much more convincing 41, and we expect a parallel calibration for the Y chromosome in the near future.


The authors conclude the paper with this paragraph:

Despite of the mutation rate uncertainty, we evaluate our calculation of absolute divergence time as acceptable. Firstly, our out-of-Africa date (54.1 kya) is still within the range of previous estimations (39 – 74.6 kya). Secondly, the out-of-Africa date is similar to the recent estimation of two great mitochondrial expansions outside Africa – M (49.6 kya) and N (58.9 kya) 42. Thirdly, it is not contradictory to the emergence of earliest modern human fossil out of Africa (e.g. ~ 50 kya in Australia) 43.

In the Supplementary Materials/Additional Discussions section they also mention this:

It remained mysterious that how many times the anatomically modern human migrated out of Africa, since that among the three superhaplogrous C, DE and F, Haplogroup F distributes in whole Eurasia, C in Asia and Austronesia, D exclusively in Asia, while D’s brother clade E distribute mainly in Africa 62, so there are two hypotheses, 1) haplogroups D and CF migrated out of Africa separately; 2) the single common ancestor of CF and DE migrated out of Africa followed by a back-migration of E to Africa. From this study, the short interval between CF/DE and C/F divergences weakens the possibility of multiple independent migrations (CF, D, and DE*) out of Africa, and thus supports the latter hypothesis 63 (Fig. S2 a).


Perhaps the only new material they have from this study that may strengthen the hypothesis of an extra-African origin of haplogroup E is, as they mention, the 'short interval' between the common ancestor of CF and DE  and the C/F divergences, however, this 'short interval' is relative to which branch length? They did not compute the interval between the BT common ancestor and the CFDE divergence, in addition, what length of time would be considered too short to disqualify the possibility of multiple independent migrations, and how would this length of time be evaluated? next, what about the cases of DE* found in Nigeria and Guinea-Bissau that they failed to mention here, that is to say, cases found that are neither D or E but are down stream from the YAP+ insertion, how exactly are they to be explained ? 

Either way, putting all these questions aside, let us assume that their proposal is correct, how then would this be reconciled with the last paragraph in the actual paper, where they associate M and N mtDNA haplogroups, with the out of Africa expansion, this would mean that if E back migrated, it would have done so with lineages downstream from mtDNA haplogroups M and N, however, many areas in Africa where E- dominates (except for East and North Africa) have, if not zero, close to zero amounts of mtDNA haplogroups M and N, wouldn't we expect to see at least some traces of the mtDNA counter part for this supposed ancient back migration in YDNA haplogroup E dominant areas of Africa other than the East and the North ? In an otherwise good and all around informative paper, I think the authors may have jumped the gun with this particular speculation, perhaps that is why they stuck it into the supplementary section of the paper and not the actual paper itself, as a testament to the highly speculative nature to their supposition.

Wednesday, May 8, 2013

Another Extensive thesis on East African DNA


It was brought to my attention last week, thanks to a comment on this blog made by the user 'Umi', that another thesis on East African DNA variation was publicly available online:

Complex Genetic History of East African Human Populations

This is also an extensive thesis with a wealth of information akin to Plaster's thesis, the primary differences being that this one was more focused on parts of East Africa that are found further to the South of Ethiopia, and in addition to uni-parental analysis, it also included some Autosomal model-based inference, albeit of quite low resolution in today's standards; 848 microsattelites and 479 indels (refer to Tishkoff et al. 2009 for marker details).

Due to the extensive nature of the report I haven't had a chance to cover its entire scope, instead, for starters, I have first focused on the YDNA data by creating a relative frequency chart from the results reported in Fig. 3.3.2. 

Several things to initially point out here,

  • The report outlines the discovery of 4 new SNPs, TL1-4. The first two were found in Haplogroup B and downstream from B-M150 and B-M112 respectively. The last two, TL3 and TL4, were found in haplogroup E and downstream from E-U174 and E-V32 respectively. Incidentally, the fourth SNP that is under E-V32, TL4, could potentially be the same as Z808/Z809 as identified recently by the geneological community, however, as the report does not give the Y-Chromosome location of the SNP in a NCBI Build 36/37 format, this can not be verified, at least by me, at the moment.
  • A couple of the frequency results in Fig. 3.3.2 do not add up, in particular, the frequency results for the Boni and the Baggara, but also to a lesser extent for the Kanuri and Teita.  I have labeled the missing frequency results with a “?” in the relative charts for those specific populations.
  • The Burji and Konso are labeled as being only from Kenya throughout the report, however most Burji are from Ethiopia, and the Konso are exclusively found in Ethiopia, I have reflected this in the charts.
  • STR data is not readily available to perform TMRCA estimates on, however, some TMRCA results are reported using Zhivotovsky's rates in Table 3.3.1, nevertheless, these are estimates only for different lineages found in the dataset for all the samples and not necessarily comparing TMRCAs in the different populations under study.
  • J-M62, while a subclade of J-M267, is not the main subclade of J-M267 found in East Africa, that would be J-P58, therefore, the results for J-12f2.1 (x M62, M172) reported, may after all be, or largely include, J-P58 lineages, off-course those results could also include variants of J-M267 other than J-P58 and J-M62 as well since the SNP was not directly tested. 
  • E-P2* lineages are abundantly found (> 30%) in the Konso, Burji and Mbugwe, however on closer examination and correlation with current data, these could be E-M329, E-V38* or even E-M215*, as none of these SNPs were directly tested. Genuine E-P2* lineages would be positive for E-P2 and negative for V38 and M215 (See Trombetta et al. 2011)
  • Similarly, the E-M35* lineages reported could be members of relatively newly discovered lineages of E-Z830*( See this post for details), or some of the untested variantes of E-M35, i.e.  E-V42, V92 and maybe even E-V68 (x M78)

Thursday, March 7, 2013

African Sahel YDNA


Multiple and differentiated contributions to the male gene pool of pastoral and farmer populations of the African Sahel


ABSTRACT

The African Sahel is conducive to studies of divergence/admixture genetic events as a result of its population history being so closely related with past climatic changes. Today, it is a place of the co-existence of two differing food-producing subsistence systems, i.e., that of sedentary farmers and nomadic pastoralists, whose populations have likely been formed from several dispersed indigenous hunter-gatherer groups. Using new methodology, we show here that the male gene pool of the extant populations of the African Sahel harbors signatures of multiple and differentiated contributions from different genetic sources. We also show that even if the Fulani pastoralists and their neighboring farmers share high frequencies of four Y chromosome subhaplogroups of E, they have drawn on molecularly differentiated subgroups at different times. These findings, based on combinations of SNP and STR polymorphisms, add to our previous knowledge and highlight the role of differences in the demographic history and displacements of the Sahelian populations as a major factor in the segregation of the Y chromosome lineages in Africa. Interestingly, within the Fulani pastoralist population as a whole, a differentiation of the groups from Niger is characterized by their high presence of R1b-M343 and E1b1b1-M35. Moreover, the R1b-M343 is represented in our dataset exclusively in the Fulani group and our analyses infer a north-to-south African migration route during a recent past.

Closed Access



Y(x CF)  Phylogeny, Red = SNPs Tested, Blue =Presumed Tested 
CF Phylogeny, Red = SNPs Tested, Blue =Presumed Tested

Thursday, February 21, 2013

The Zhivotovsky Multiplier


It is reported that Zhivotovsky's effective mutation rate [1] has the effect of increasing the TMRCA of a lineage, as computed by the use of Microsattelite Genetic Distances[2], by a factor of 3-4 fold relative to TMRCAs computed via mutation rates observed in pedigree and family studies [3].

By utilizing my TMRCA calculating program, I want to explore,
  1. What effect does different marker combinations have on this multiplier ?
  2. What effect does marker size have on this multiplier ?
  3. Is there a variation in this multiplier for different data-sets?

First, to ensure that my program correctly calculates the TMRCA when the Zhivotovsky mutation rate of 0.00069 is applied to all the markers in my database consistently (versus only the marker specific Pedigree mutation rates I have thus far been utilizing), I attempted to replicate the TMRCA computations of the following publication;


Friday, February 8, 2013

Sudan YDNA

This is from a relatively old study, but it seems that it is the most comprehensive YDNA breakdown we have of North and South Sudan to date.

Y-chromosome variation among Sudanese: restricted gene flow, concordance with language, geography, and history. Hassan (2008)

Here is a map of the populations tested from Fig.1 of the Study
Populations Studied

Here below is the phylogeny (as known back in 2008) of the SNPs tested, note that those in bold; E-M75, E-P2, G-M201 and T-M70 were NOT tested in the study.

SNPs tested (except those in bold)
The E-M78+ cases from above were also tested for Cruciani's V-Series SNPs as well for further resolution,


Cruciani's V-Series SNPs (2007)

Some notes:


  • The high level (38%) of E-M215 (x M78) in the Borgu is quite intriguing, I wonder what variant/s of E-M215 it is?
  • Almost all the J-12f2(x M172) should be J-M267.
  • B-M60 is found in Southern Nilo-Saharan speakers and not the North Western ones, while A-M13 is found in both.
  • The F-M89(x M52,M170,I2f2, M9) found in the north is also interesting, although it could possibly be G-M201, at least part of it.
  • E-V22 has a relatively high presence in these samples, even when compared to the Egyptian samples from Cruciani '07, and most certainly higher than its presence in Ethiopia.
  • The High presence of E-V12 (x V32) is also concordant with its putative area of origin, all the E-M78 found in the Nuer and the Copts is of this variety.
  • The presence of E-M78* in the Masalit and the Nuba is notable.
  • Off course the strangest result is the 54% R-M173 (x P25) in the Fulani, this could be some R1b*(R-M343), or some type of R1a, the latter would be very out of place for the region, while the former could be reconciled with the presence of more downstream R1b variants in Africa. 


Saturday, January 5, 2013

TMRCA calculations from Plaster NRY data : Correcting an Error


Previously, I had computed TMRCAs for the YDNA STR data from the additional material that was provided along with Dr.Chris Plaster's thesis. However, after a brief communication with the author, I found out that the marker order of the STRs in the excel file was reported wrongly, the correct order for the markers are thus as follows:

DYS19 DYS388 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439 DYS448 DYS456 DYS635 Y GATA H4

This changes my TMRCA calculations because I am not computing the coalescent using a generic mutation rate that is equivalent for all the markers, but rather each marker has its own mutation rate attributed to it.

When I rerun my program using the newly corrected order above I get the following:


As can be seen, using the new order of markers generally reduces the number of generations to coalescent for the Plaster data-set. The previous observation of a relatively lower TMRCA for the haplozone data of E-M123 versus that of the E-M34 Plaster data-set largely disappears. 

To check if the fact that the high number of samples (129) present in the E-M123 haplozone data-set was skewing the results, I took 23 random samples (which equals the same number of samples available in the Plaster E-M34 data-set) from the larger E-M123 Haplozone dataset and re-run the TMRCA calculations on just those samples, I repeated this process 300 times, only 28% of the runs yielded a mean TMRCA less than the E-M34 Plaster data-set, if sample size was skewing the results I would expect >50% of the runs to have a mean TMRCA less than that of the E-M34 plaster dataset.

That said, the E-M34 Plaster data-set still had a relatively higher generations to coalescent than the E-M84 Haplozone dataset, E-M84 is a subclade of E-M34 and a high majority of haplotypes that belong to E-M34 also test positive for the E-M84 SNP (at least for the non-African E-M34 haplotypes that we know of).

Other than that, the new, and corrected, ordering of the markers did not have much impact in relative TMRCA terms between the Plaster and Haplozone/FTDNA data for the other lineages I had tested.

Tuesday, December 11, 2012

National Geographic fesses up on the origin of E-M35

In their second phase of the massive global scale genetic testing project, Geno 2.0: The Greatest Journey Ever Told, National Geographic has finally fessed up to the most parsimonious explanation to the origin of YDNA haplogroup E1b1b1, this is good news, even if it took 8 years to do so, i.e. about 8 years after the publishing of the first detailed paper on E-M35.

In the first phase of the Geneographic project, launched in 2005, E-M35's origin was explictly stated as the following :

"The man who gave rise to marker M35 was born around 20,000 years ago in the Middle East. His descendants were among the first farmers and helped spread agriculture from the Middle East into the Mediterranean region."
Original E-M35 National Geographic Description


You can read what it reads today in the screen shot below:
Current E-M35 Nat. Geographic Description

There is also the sentence, "Today, in keeping with its place of origin, this line is common among Afro-Asiatic speakers", could the part, 'in keeping with its place of origin', be also a 'nudge' at the very distinct,  and in my opinion, strong, possibility that Afroasiatic may have originated in East Africa as well ?
If so, this would be a first for a major outlet like Nat Geo and others, even though, renowned Afroasiatic experts like Greenberg, Ehret, Blench et. al had said this for decades.

Update: Another point that is odd in their new phylogeny seen above, is the ordering of some of the NRY SNPs leading up-to V12, the SNPs leading up-to P147 are in standard sequence, i.e the sequence M42 > M168 > M203 > M96 > P147, is common knowledge, however P177 is listed as downstream of P2, where common knowledge says it is the reverse, i.e. P147 > P177 > P2, instead of P147 > P2 > P177. Similarliy, M215 is not known to be a subclade of M35.1 but rather the reverse, so overall, their sequence should read as follows : M42 > M168 > M203 > M96 > P147 > P177 > P2 > M215 > M35.1 > M78 > V12. Unless off-course they have found some samples that upset the standard NRY SNP sequence leading upto E-V12 that we do not know about yet.

Monday, November 26, 2012

Extensive Doctoral Thesis on Ethiopian Y and mtDNA

I was contacted earlier by Dr. Chris Plaster about a doctoral thesis on Ethiopian Y & mtDNA that was completed 2 years ago but had been embargoed to the public until only about two months ago. As this is the first time I am coming across of it, plus since it is 204 pages long I have not had a chance to go through it thoroughly, but suffice it to say that this is the most extensive work on Ethiopian NRY & mtDNA that I have seen to date, although the resolution leaves a lot to be desired, I will update this post more as I read it more thoroughly over the next few days/weeks...


Variation in Y chromosome, mitochondrial DNA and labels of identity on Ethiopia


Some numbers and figures that caught my attention at first glance:





The Discussion section also has some interesting things to say, especially with respects to haplogroups A3b2 and J, but also the remaining ones found in Ethiopia as well.

Monday, October 8, 2012

YDNA from Southern Africa

Naidoo et. al (2010) reports YDNA from 3 different groups in Southern Africa with a fair amount of resolution.
Electropherogram and phylogeny

Here below are the frequencies found:

Tuesday, September 18, 2012

Berber YDNA

Decent resolution composite Berber YDNA from The Berber and the Berbers, Genetic and linguistic diversities, Jean-Michel Dugoujon et. al (2009)

Phylogeny of the 29 biallelic MSY markers (in bold) tested


Update: With respect to R-P25 (x M269) found in the Siwa and Mozabite Berbers, there is an even more exact breakdown of the lineage in this table from another publication using the same samples as above. It shows for the Siwa Berbers, the 26.9% of R-P25 (x M269) being further resolved to 23.7 % R-V88* (x M18, V8, V35, V69) plus 3.2% R-V69 (a branch of R-V88), similarly for the Mozabite Berbers, the 3% of R-P25 (x M269) is all resolved to R-V88* (x M18, V8, V35, V69).

Tuesday, June 19, 2012

Finding the TMRCA of Ethiopian YDNA lineages using an ASD method.


I have been lately working on computing TMRCAs using an ASD or average square difference method on publicly available Y-STR haplotypes. The premise for finding the TMRCA using the ASD method is quite straight forward and easy to understand, a putative ancestral haplotype is calculated for a given dataset and the repeat of each sample at each marker in the dataset is subtracted from this ancestral haplotype, this result is then cumulated and divided by the number of samples and the marker specific mutation rate, the process is repeated for every single marker in the dataset and the mean is then multiplied by an assumed years per generation length, the formula below articulates this method:
TMRCA formula (ASD method)
 
Where;
N= Total number of Samples
Z= Total number of Markers
L0= Putative Ancestral Haplotype (Median or Modal repeats)
L= Individual sample haplotype repeats
m= Marker Specific Mutation Rate
G= Years / Generation

The biggest variable here, other than the sampling strategy of a given dataset, are the several marker specific mutation rates that are available. The process of selection of a correct mutation rate is an unsettled issue, I have therefore utilized 4 sets of mutation rates that were compiled by Paul Newlin, a collaborator at the E3b Project, these rates come from several different publications and you can read about them here for more detail:
 
  1. The Chandler Mutation Rates:
  2. Stafford Bayesian Mutation Rates:
    Essentially a compilation of other mutation rates
  3. Burgarella & Navascués Mutation Rates:
  4. Ballantyne Mutation Rates:
In order to have an analogously accurate comparison of the TMRCAs between the different publications, I had to weed out and intersect the available markers from above with markers that are found in the public domain. This essentially left me with the following 46 markers that intersected with all 4 of the above sets of rates as well as the 66 markers that are widely used:
406s1 , 19 , 388 , 389-1 , 389-2 , 390 , 391 , 392 , 393 , 426 , 436 , 437 , 438 , 439 , 442 , 444 , 446 , 447 , 448 , 450 , 454 , 455 , 456 , 458 , 460 , 472 , 481 , 487 , 490 , 492 , 511 , 520 , 531 , 534 , 537 , 557 , 565 , 568 , 572 , 578 , 590 , 594 , 617 , 640 , 641 and gatah4.

In addition, since the Chandler mutation rates had a complete intersection with the 66 widely used markers, an additional 66 marker Chandler set was independently used that included the following markers in addition to the 46 listed above:
385a , 385b , 459a , 459b , 449 , 464a , 464b , 464c , 464d , ycaiia , ycaiib , 607 , 576 , 570 , cdya , cdyb , 395s1a , 395s1b , 413a and 413b.
  
Haplogroups A, E and J, cover well over 90% of the YDNA lineages found in Ethiopia. More specifically within these haplogroups, I was more interested in finding the TMRCA for A-M13, E-M35 and J1-M267, as these lineages cover over 70% but under 80% of said lineages, whereas the remaining 20-30% of lineages found in Ethiopia belong to E1b1*(x E1b1b,E1b1a1), other types of E lineages like E2 and E*, and some specific clades that belong to haplogroups B,T and J2.

Sunday, November 22, 2009

Y-DNA Variation Maps For East Africa

The supplementary information of the Chiaroni et. al 2009 paper "Y chromosome diversity, human expansion, drift, and cultural evolution", has some informative Y-DNA frequency distribution maps for the major globally distributed haplogroups. Here are the frequency distribution maps pertinent to East Africa (namely Ethiopia and periphery).

1) Estimated and calculated centroids location map (Fig. S4 -b)


Note: Centroids are not necessarily indicative of origin, diversity of the lineages in question are a better indicator of origin.

2)  Haplogroup A frequency distribution (Fig. S2 -b)


"Haplogroups A and B are the deepest branches in the phylogeny and are essentially restricted to Africa, bolstering the evidence that modern humans first arose there (14, 15). Haplogroup A is mainly found in the Rift Valley from Ethiopia to Cape Town, mostly but not exclusively in some of the oldest hunter-gatherers who still survive and speak Khoikhoi and San languages, proposed by some to be the oldest languages. The interruption of its distribution in the middle of the Rift Valley is possibly the consequence of replacement by Bantu-speaking farmers who settled the region starting in the first millennium of the Christian era."

The "Max" of 11.5% A3b2 (M13) shown above seems a bit low at first glance for East Africa, Semino et. al 2002 and Cruciani et. al 2002 together found 24 A3b2 lineages out of a total of 148 sampled in Ethiopia; Amhara (48), Oromo(78), Beta Israel (22). However, if the results of the Hassan et. al 2008 study of Sudan and the Sanchez et. al 2005 study of Somalia are added to the studies above, the frequency of A3b2 drops from ~16% in just Ethiopia to ~11.5 % (75 / 650) when Sudan and Somalia are included. Also important to note is that the Beta Israel, Dinka, Shiluk, Nuer and Nuba all carry anywhere between 33% and 62% of A3b2 lineages.

3) Haplogroup B frequency distribution (Fig. S1 -a)
 
"Haplogroup B is found mainly among African Pygmies, who live in the central African forest and are still predominantly hunters-gatherers but speak Bantu languages borrowed from farmers who arrived in the area between 2,000 and 3,000 years ago."

Haplogroup B is found at low levels in Ethiopia, with frequencies varying anywhere between 0 - 2% ( Cruciani et. al 2002, Semino et. al 2002, Moran et. al 2004). The haplogroup is however much more common in Sudan, with frequencies reaching as high as 50% in the Nuer. Generally, frequencies of Haplogroup B are found any where between 8% and 27% in the Sudan among the Nubians, Nuba, Copts, Hausa, Dinka and Shilluk. (Hassan et. al 2008)

4) Haplogroup E frequency distribution (Fig. S1 -b)


"The third predominantly African haplogroup, E, diversified some time afterward, probably descending from the East African population that generated the Out of Africa expansion. The geographic distributions of the major branches of this haplogroup, given in Fig. S1b, suggest that most of the settlement outside of Africa by haplogroup E members involves the later mutant E-M35 varieties like M78, M81, and M123 that extended to Arabia and the northern Mediterranean coast."

See the following threads for specifics on haplogroup E distribution in East Africa: E1b1b, E1b1b1a

5) Haplogroup J frequency distribution (Fig. S5 -a)


Further information on Haplogroup J in Ethiopia can be found in: Semino et. al 2002, Moran et. al 2004, Tofanelli et. al 2009, Chiaroni et. al 2009.

6) Genetic Diversity as a function of Distance from Addis Abeba (Fig. S6 -a)