Tuesday, October 22, 2013

New paper sheds light on the F-series YDNA SNPs

The F-series YDNA SNPs appeared at the end of last year with results from Geno 2.0, now an electronic pre-print at arXiv.org sheds some light on the discovery of these SNPs.

The paper, entitled :  Y Chromosomes of 40% Chinese Are Descendants of Three Neolithic Super-grandfathers, is freely available for download.

Some interesting (relevant to this blog) quotations from the paper follows (in blue) :

To identify major population expansions related to male lineages, we sequenced 78 East Asian Y chromosomes at 3.9 Mbp of the non-recombining region (NRY), discovered >4,000 new SNPs, and identified many new clades.

Nearly all the Y chromosomes outside Africa are derivative at the SNP M168 and belong to any of its three descendent super-haplogroups – DE, C, and F 9,10,15, strongly supporting the out-of-Africa theory. The time of the anatomically modern human’s exodus from Africa has yielded inconsistent results ranging from 39 kya 16, 44 kya 10, 59 kya 17, 68.5 kya 18 to 57.0 – 74.6 kya 19.


This below explains why the F-series SNPs are for the most part found below CT-M168.

we selected 110 males, encompassing the haplogroups O, C, D, N, and Q which are common in East Eurasians, as well as haplogroups J, G, and R which are common in West Eurasians (see Table S1), and sequenced their non-repetitive segments of NRY using a pooling-and-capturing strategy.


Overall ~4,500 base substitutions were identified in all the samples from the whole Y chromosome, in which >4,300 SNPs that has not been publicly named before 2012 (ISOGG etc.). We designated each of these SNP a name beginning with ‘F’ (for Fudan University) (see Table S2). We obtained ~3.90 Mbp of sequences with appropriate quality (at least 1x coverage on >100 out of 110 samples), and identified ~3,600 SNPs in this region.


Table S2 is not available in the PDF file, the link says that all the tables are in a 'separate ancillary file', but such file is also not available, at least not at the time of the publishing of this post, and may become available when the paper is officially published. With out seeing the actual location on the Y chromosome where these SNPs are found it is hard to say how many of them are redundant SNPs relative to the PF and CTS SNPs, and how many of them are truly 'novel'.

Considering that 3.9 Mbp range constitutes only less than half of 10 Mbp non-repetitive region in Y chromosome 7, the time resolution of east Asian Y chromosome phylogeny is expected to be doubled in the near future.


To overcome the factors for uncertainty of mutation rate, a calibration with series of samples of comparable time scales might be used. For the case of mitochondrion, a recent study, in which several C-14 calibrated ancient complete sequences (4 – 40 kya) were incorporated into the tree, made the absolute dates much more convincing 41, and we expect a parallel calibration for the Y chromosome in the near future.


The authors conclude the paper with this paragraph:

Despite of the mutation rate uncertainty, we evaluate our calculation of absolute divergence time as acceptable. Firstly, our out-of-Africa date (54.1 kya) is still within the range of previous estimations (39 – 74.6 kya). Secondly, the out-of-Africa date is similar to the recent estimation of two great mitochondrial expansions outside Africa – M (49.6 kya) and N (58.9 kya) 42. Thirdly, it is not contradictory to the emergence of earliest modern human fossil out of Africa (e.g. ~ 50 kya in Australia) 43.

In the Supplementary Materials/Additional Discussions section they also mention this:

It remained mysterious that how many times the anatomically modern human migrated out of Africa, since that among the three superhaplogrous C, DE and F, Haplogroup F distributes in whole Eurasia, C in Asia and Austronesia, D exclusively in Asia, while D’s brother clade E distribute mainly in Africa 62, so there are two hypotheses, 1) haplogroups D and CF migrated out of Africa separately; 2) the single common ancestor of CF and DE migrated out of Africa followed by a back-migration of E to Africa. From this study, the short interval between CF/DE and C/F divergences weakens the possibility of multiple independent migrations (CF, D, and DE*) out of Africa, and thus supports the latter hypothesis 63 (Fig. S2 a).


Perhaps the only new material they have from this study that may strengthen the hypothesis of an extra-African origin of haplogroup E is, as they mention, the 'short interval' between the common ancestor of CF and DE  and the C/F divergences, however, this 'short interval' is relative to which branch length? They did not compute the interval between the BT common ancestor and the CFDE divergence, in addition, what length of time would be considered too short to disqualify the possibility of multiple independent migrations, and how would this length of time be evaluated? next, what about the cases of DE* found in Nigeria and Guinea-Bissau that they failed to mention here, that is to say, cases found that are neither D or E but are down stream from the YAP+ insertion, how exactly are they to be explained ? 

Either way, putting all these questions aside, let us assume that their proposal is correct, how then would this be reconciled with the last paragraph in the actual paper, where they associate M and N mtDNA haplogroups, with the out of Africa expansion, this would mean that if E back migrated, it would have done so with lineages downstream from mtDNA haplogroups M and N, however, many areas in Africa where E- dominates (except for East and North Africa) have, if not zero, close to zero amounts of mtDNA haplogroups M and N, wouldn't we expect to see at least some traces of the mtDNA counter part for this supposed ancient back migration in YDNA haplogroup E dominant areas of Africa other than the East and the North ? In an otherwise good and all around informative paper, I think the authors may have jumped the gun with this particular speculation, perhaps that is why they stuck it into the supplementary section of the paper and not the actual paper itself, as a testament to the highly speculative nature to their supposition.

15 comments:

  1. I very much agree that the explosion of Y-DNA E in Africa does not seem linked to any back-migration from Eurasia. The basal diversity of E (and most of it) is clearly anchored in Africa, and, trying to be more precise but still very roughly, the Middle and Upper Nile area. The expansion of E is only indirectly related to the OoA, being the successful CDEF clade in the African ancestral homeland, while CF and pre-D are offshoots established first in Asia (C and F should not be considered separately re. the OoA).

    The back-flow of Eurasian mtDNA lineages to parts of Africa is surely linked instead to other Y-DNA haplogroups, such as J, T and R1b.

    ReplyDelete
    Replies
    1. While I agree with the first part of your comment , I don't think your latter proposal:

      "The back-flow of Eurasian mtDNA lineages to parts of Africa is surely linked instead to other Y-DNA haplogroups, such as J, T and R1b."

      is really sustainable, considering that for example the TMRCA estimate I have for J in Ethiopia, that would be using all the rates including Zhivotovsky, comes out to a range of 466 - 897 generations, or ~ 14 - 27 KYA, but that is including the chandler rates which for some reason gives unusually very high estimate for J, if you exclude the chandler rates, all the other rates give ~ 500 generations = ~ 15KYA, while the TMRCA estimate for M1 is in the range 17.9 – 34.7 KYA .

      I am not sure about the age estimates for the other YDNA lineages that you mention, but surely you can use a similar logic as above to disprove it.

      Delete
  2. Since E is a subclade within CDEF and an African specific subclade, it makes more sense that it back migrated into Africa, rather than the other way around. For the African origin of the CDEF clade to be true, one would need to find C*, D* and F* in Africa, but they are clearly not there. On the contrary, there are E* outside of Africa. Autosomal data seems to support Eurasian contribution to the SS African gene pool. You're right, though, that currently there's no mtDNA parallel to the Y-DNA E or the autosomal "back migration." This is, however, could be a problem of the mtDNA phylogeny. Now, that we have Y-DNA tree topology to compare, mtDNA phylogeny should have L0 and L1 as correlates to Y-DNA A and B, while the L2, L3, L4, L5 and L6 lineages should be correlates of Y-DNA E. L6 is a small divergent L lineage and it's found in West Asia, exactly where the "Y Chromosomes of 40% Chinese Are Descendants of Three Neolithic Super-grandfathers" paper places the origin of Y-DNA hg E. If we reverse a few mutations defining L2, L3, L4, L5, and L6 and make them derived from hg M, we'll get a match between the Y-DNA tree topology and the mtDNA tree topology. mtDNA is highly variable, so how do we now that this nexus doesn't have' any hot spots?

    ReplyDelete
    Replies
    1. “For the African origin of the CDEF clade to be true, one would need to find C*, D* and F* in Africa “

      That is not necessarily true, the common ancestor of 'CDEF', or more correctly CF-DE, is known as CT-M168, to my knowledge, this SNP has never been found in any living male devoid of any one of its two currently known down-stream markers, i.e. CF or DE. However, we do know where the sibling of this SNP (M168) is found, and that is most frequently in African rainforests, more specifically amongst pygmies, but also amongst the hadza and Nuer , as a matter of fact the sibling of CT-M168, more precisely lineages that belong to B-M60, are not even found outside of Africa. Similarly , the 'parent' or any upstream SNPs of M168 have only been observed in Africa. So we know that both the 'parent' and 'sibling' SNPs of CT-M168 have only been observed in Africa, next we can take a look at its 'children' SNPs, as far as CF-P143 is concerned, devoid of the C and F SNPs, it has never been observed to my knowledge, where as DE, devoid of SNPs D and E has been observed in both Africa and Asia, although more numerously in Africa than in Asia. Subsequently, E is most diverse and frequent in Africa, where as D is most diverse and frequent in Asia. So while it is possible that CT-M168 may have been born outside of Africa, it most certainly is not probable by any parsimonious measuring stick.

      “On the contrary, there are E* outside of Africa “

      Ah, Abu-Amero's over-cited E* lineages from Saudi Arabia, well I hope you will forgive me if I re-quoted myself from what I wrote on Maju's blog at the beginning of the year as I feel that it is still applicable and relevant here:

      Anyway, first let me mention something with respect to haplogroup E, since it is once again being dubiously implicated as having non-African origins, the Plaster thesis  that I had posted on my blog a few months back found two cases (out of 69) where E-M96 (x P147, M75) were found in Ethiopia, specifically amongst the Amhara dataset. This is significant as it proves, again, the antiquity of E* in East Africa, previously, only one find in Southern Africa was found by Karafet (2008), an additional 2 finds were reported by Abu-Amero (2009)  in their Saudi Arabian dataset, but these could have very easily been transported from nearby Africa in recent times. In addition, the database of private genetic testing company FTDNA reports one additional case of E-M96 (x P147, M75) from Ethiopia. So this is in addition to all the subclades of E-M96 that are found in Africa, most of which are exclusively found in Africa. Therefore, people who implicate E-M96 having non-African origins do so, on not only very flimsy data, but also on grounds that are questionably parsimonious.

      “ L6 is a small divergent L lineage and it's found in West Asia”

      This is what I would refer to as a 'stretch', consider the almost 16% of L6 found in the Ongota in Boattini 2013 and see how that stacks up with the sporadic L6 cases found in West Asia, lets not make a mountain out of a molehill.

      Delete
    2. "as far as CF-P143 is concerned, devoid of the C and F SNPs, it has never been observed to my knowledge, where as DE, devoid of SNPs D and E has been observed in both Africa and Asia, although more numerously in Africa than in Asia. "

      DE is attested in so very few individuals that it doesn't make sense to hold it as a standard of attestation that CF* has presumably failed.

      "more precisely lineages that belong to B-M60, are not even found outside of Africa."

      So, how come they weren't carried out of Africa with the founding migration? A00 is well and alive among African Americans and it was part of an uncontroversial migration of West African slaves to the New World post-1492. How come the out-of-Africa "darlings," hgs A and B are not found along the putative migration routes, for instance in the Sahul? This is the obverse side of the problem with C, F and D - they are not found in Africa but they define the variation outside of Africa.

      "Abu-Amero's over-cited E* lineages from Saudi Arabia"

      Not only. E* was also found in India, among Dungri Bhils. http://www.ncbi.nlm.nih.gov/pubmed/17786594

      "So while it is possible that CT-M168 may have been born outside of Africa, it most certainly is not probable by any parsimonious measuring stick."

      CT is a clade of which African E is only one-fourth. The rest is in Asia. C, F, DE and E are all attested in Asia, while only DE and E are attested in Africa. By all the logic inherent in out-of-Africa thinking itself, it's a phylogenetic pattern most consistent with a back migration into Africa. Plus you're confusing parsimony with circular argument. If you base your conclusions about the origins of hg E on the fact that its "brother" and "father" are in Africa, then you deny back-migrations in principle, as if human populations are driven forward by Manifest Destiny. You can always argue yourself out of accepting that, say, mtDNA M1 and U6 are signs of a back-migration into Africa because their "great-grandparents" are in Africa. You've exposed one of the logical flaws of the phylogenetic thinking when it comes to human prehistory. And most recently the tide has turned away from it towards more of an "isolation and admixture" thinking.

      Since hgs A and B are African-specific they may have come from ancient hominins. Hg E is found in every remote corner of Africa, including San, Hadza and Pygmies. Maybe this is the true signature of the "modern human" colonization of Africa? It will be consistent with very shallow linguistic and cultural diversity on this continent. Also, remember that Pygmies speak Niger-Congo languages, which are of Holocene age, and there are no skeletal remains of Pygmies older than a thousand years. The antiquity or the modern human origin of hgs A and B (as well as mtDNA L0 and L1) are hanging by a thin phylogenetic thread that we'll never be able to test through ancient DNA.

      Delete
    3. Also, remember that there's growing autosomal evidence for a back-migration into East Africa carrying Neandertal genes (http://anthropogenesis.kinshipstudies.org/2013/10/neandertal-admixture-in-africans-a-back-migration-confirmed/). Your pseudo-parsimonious logic would result in an idea that Neandertals evolved in Africa because there are lineages in Africa that show "ancestral" states compared to the "derived" Neandertal ones.

      "This is what I would refer to as a 'stretch', consider the almost 16% of L6 found in the Ongota in Boattini 2013 and see how that stacks up with the sporadic L6 cases found in West Asia, lets not make a mountain out of a molehill."

      I need to read Boattini. If I have any thoughts once I read it, I'll post them here. Thanks for the reference.

      Delete
    4. While I'm still digging for Boattini, here's a quick and rough writeup of my thoughts on the origin of some basic of the basic features of mtDNA and Y-DNA phylogenies. So you have a full background. You're welcome to comment on it. Although I don't know your credentials but I see you have some good knowledge base.

      http://anthropogenesis.kinshipstudies.org/2013/11/the-end-of-out-of-africa-a-copernican-reassessment-of-the-patterns-of-genetic-variation-in-the-old-world/

      Delete
  3. Good analysis on the D E split

    ReplyDelete
  4. More cases of Y-Dna haplogroup E* in Africa:
    http://mbe.oxfordjournals.org/content/26/7/1581.long
    http://www.nature.com/ejhg/journal/v13/n7/full/5201408a.html

    Y-Dna haplogroup E clearly originated in Africa. The lack of diversity in Y-Dna haplogroup E in outside of Africa is proof!
    The "Y-Dna haplogroup E is Eurasian" crowd always fall silent when it comes to Y-Dna haplogroup DE* being found amongst West African men!

    ReplyDelete
  5. Thanks for the link, but the specific cases of E* we are looking for here are cases that are positive for M96 , but negative for both P147 and M75, these are the most basal cases of E* known to date and may only be found (if tested for) in publications after 2007/08, which was when the discovery of the branch of the E-Lineage defined by the P147 SNP was documented (Karafet et. al). Same goes for German's link above.

    ReplyDelete
  6. Why don't you tell the full story instead of mentioning only the African DE* cases? (Weale et al. 2003 show that the five Nigerians were divided into two clusters with identical haplotypes in common, and this means that they expanded recently from a single common ancestor). You should also mention the two Tibetan cases reported by Shi et al. (2008) and the Syrian case in the FTDNA project, along with the two YAP+ (could be DE* as well as D*) individuals from South India reported by Cordaux et al. (2004). Regarding E* cases outside of Africa, Abu-Amero et al. (2009) is not the only study who reports it, there are 5 Lebanese individuals and 1 Syrian that are E*(xE1,E2,E3a,E3b) reported by Zalloua et al. (2008ab), a Turkish E*(xE1,E3a,E3b) from Antalya reported by Serdar et al. (2009) and a Bulgarian E*(xM35,M2) reported by Karachanak et al. (2013). I can extend the list by mentioning the uniquely Eurasian and only real cases of: 1 E1b1b-M215* found in Khorasan (North-Eastern Iran) reported by Di Cristofaro et al. (2013), two E1b1-P2*(xM35,M2) ethnic Russians from Vladivostok reported by Lessig et al. (2008), 4 E1b2-P75 cases in Palestine, England and Canary Islands, 3 Sardinian and 2 Dutch V68* reported by Trombetta et al. (2011) and the E-M35 Phylogeny Project, respectively, L618* (parent of V13) cases only in Europe. Apart from these uniquely Eurasian ancient sub-clades, cases of M35*, V257*, Z830*, M123* are continually reported everywhere in Europe and Asia.

    http://www.genetics.org/cgi/content/full/165/1/229
    http://www.biomedcentral.com/1741-7007/6/45
    https://www.familytreedna.com/public/Y-DNA-Tree-Early-Branches/default.aspx?section=ysnp (look at the surname Al-Bitar)
    http://rcordaux.voila.net/pdfs/12.pdf
    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2427286/
    http://www.ncbi.nlm.nih.gov/m/pubmed/18976729/
    http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0056779
    http://www.rjlm.ro/doc/08-y-snphaplogroupsintheantalyapopulation.pdf
    http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0076748
    http://volgagermanbrit.us/documents/Y_SNP_Three_Populations.pdf
    http://www.anthrogenica.com/showthread.php?99-E1b2-(P75)
    http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0016073
    http://www.haplozone.net/e3b/project

    ReplyDelete
    Replies
    1. “You should also mention the two Tibetan cases reported by Shi et al. (2008) and the Syrian case in the FTDNA project”

      I am well aware that DE* has been found in Tibet, I also mentioned that it has been found outside of Africa in my comments above, frequency wise however more cases have been found in West Africa, including in Guinea-Bissau, than anywhere else.
      With respect to cases of DE* found from Private Genetic Testing Companies (PGTC), there are quite a few African American cases from 23andME as well which I didn't mention, so the Syrian DE* case is not as meaningful as you think.
      PGTC samples are heavily skewed towards non-African samples, for instance, the Haplozone site you link to, shows that E-V32 cases are most frequently found in the Arabian Peninsula, however in all reality we know that this is not the case at all since E-V32 is most heavily concentrated in the horn of Africa. Frequency inferences from PGTC results are therefore most meaningful for non-African and most especially for European samples.

      “Regarding E* cases outside of Africa, Abu-Amero et al. (2009) is not the only study who reports it,”

      If you read what I wrote above, my definition of E* has strictly been E-M96 (x P147, M75) , none of the E* cases that you mention meet that definition, which is the most basal definition of E* that we know of today.

      “ I can extend the list by mentioning the uniquely Eurasian and only real cases of: 1 E1b1b-M215* found in Khorasan (North-Eastern Iran) reported by Di Cristofaro et al. (2013), two E1b1-P2*(xM35,M2) ethnic Russians from Vladivostok reported by Lessig et al. (2008), 4 E1b2-P75 cases in Palestine, England and Canary Islands, 3 Sardinian and 2 Dutch V68* reported by Trombetta et al. (2011) and the E-M35 Phylogeny Project, respectively, L618* (parent of V13) cases only in Europe. Apart from these uniquely Eurasian ancient sub-clades, cases of M35*, V257*, Z830*, M123* are continually reported everywhere in Europe and Asia.”

      Pardon me, but are you implying that E-P2*, E-M215*, E-M35* all originated in Europe or Asia ??
      If so, then that is a ridiculous implication that I do not wish to engage in, if not, I do not understand the relevance of that comment.

      Delete
    2. "If you read what I wrote above, my definition of E* has strictly been E-M96 (x P147, M75) , none of the E* cases that you mention meet that definition, which is the most basal definition of E* that we know of today."

      You are right, P147 was not tested so the probability of them being E-M96* is not 100%, though it is very probable that this is the case for most or all of them. But one fact is established: there's more E-M96* in Saudi Arabia than in Ethiopia, so I really don't think you are right when you attribute Saudi Arabian E-M96* to recent introgression from Africa.

      "Pardon me, but are you implying that E-P2*, E-M215*, E-M35* all originated in Europe or Asia ??
      If so, then that is a ridiculous implication that I do not wish to engage in, if not, I do not understand the relevance of that comment."

      Of course I'm implying that. You have to add E-M96* to that list, though. I had no doubt about your reluctance to discuss this. If it is ridiculous as you say, you should at least try to explain why. But I need facts to be convinced by you about my implications being wrong, not the usual "this is impossible and crazy, E and its sub-clades originated in Africa, period!".

      Delete
  7. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. The fact that African Americans exhibit DE* means nothing. As I exposed earlier, Weale et al. showed that West African DE* is reported at a higher frequency because related people were sampled (the use of surnames in Nigeria is recent). The haplotypes formed two clusters, each gathering individuals with identical haplotypes. These two clusters were separated by a mere GD of 1. The real frequency of DE* in West Africa is therefore much lower than that in Asia.

      Delete