Monday, March 12, 2012

TreeMix analysis on the African Dataset


Thanks to a commenter going by the moniker 'Eze', who notified me the other day of a new program called Treemix, in which it infers “patterns of population splitting and mixing from genome-wide allele frequency data”, I had a chance to give it a try on the Intra-African Dataset that I have described previously.

After converting the input file into the desired format, I decided to play with several of its functionalities to become familiar with it,
 
1) Default Maximum Likelihood (ML) Tree,

  

2) Default ML graph with 4 assumed migrations,


 3) ML graph rooted with the San-nb,

  
4) ML graph with 4 migrations and rooted with the San-nb.

A remaining option of the software that I have not as yet tried is that which groups SNPs together to account for linkage disequilibrium. 

Other than that, the results are quite as expected, the North Africans are shown in both the default and rooted trees, but especially with the San-n rooted tree, as a branch of East Africans, and where East Africans in turn are seen as a branch of other Africans, consistent with evidence from uni-parental markers, as well as published papers, for an East African genesis of Eurasians, of which North-Africans can be used as a proxy for this particular Dataset.

The 4 inferred migrations in order of decreasing edges were;

-(Biaka Pygmy, Ancestral Sotho/tswana) → Sandawe, Migration edge:0.457032; likely an old hunter gatherers link. This was noted by Tishkoff (2009) : “These results suggest the possibility that the SAK, Hadza, Sandawe, and Pygmy populations are remnants of an historically more widespread proto-Khoesan- Pygmy population of hunter-gatherers.”

-(!kung,Ancestral to Biaka and Mbuti Pygmies) → Hadza,
Migration edge:0.44087; potentially another early hunter gatherers link.

-Ethiopian Jews → San,
Migration edge:0.188914; this could be a relic of early hunter-gatherer connections with Ethiopia (See: Ethiopians and Khoisan share the deepest clades of the human Y-chromosome phylogeny.) Another possible connection for this could be the migration of YDNA E1b1b1b2b (E-M293) carriers from Eastern Africa to Southern Africa within the past few millennia.

-Mbuti Pygmy → Alur,
Migration edge:0.140627; this was also picked up by the ADMIXTURE analysis, where the Alur had significant amounts of Mbuti and Biaka pygmy components.

Further reading on the details behind the software featured in this post, TreeMix, can be found here: http://hdl.handle.net/10101/npre.2012.6956.1.


UPDATE: Run another one again rooted with the SAN from Namibia and 10 migrations assumed and got the following results, left column is Migration edge weight

0.586693 luhya →hema,hadza
0.508001 egyptans → EtA
0.504407 egyptans → EtT
0.442291 egyptans → Ethiopian-jews
0.432858 moroccans → fulani
0.27746 mbutipygmy,pygmy → sandawe
0.203223 mbutipygmy,pygmy → hadza
0.156929 egyptans → maasai
0.154406 moroccans → san
0.129901 pygmy → alur


Some of the results from the previous 4 assumed migrations run disappeared, it is not clear if migrations inferred from a lower m assumption are more statistically significant than those inferred from higher m assumptions. In general, this newer run resembles more of the K10 ADMIXTURE run, however there are some obscure differences, for instance, while it picked up a North to East African migration in the EtA, EtT and EtJ samples, it skipped the EtO samples and then picked up the same migration pattern in the maasai samples, whom had a lower 'North-African' component in the K10 ADMIXTURE run than the EtO samples. My take on this is that the program is not yet sophisticated enough to accommodate for bidirectional migrations that have happened for thousands of years, like the ones that have taken place between East and North Africa for instance. Indeed the authors of the software do list the following pertinent point as one of their assumptions:

"We also have modeled migration between populations as occurring at single, instantaneous time points."

and

"This model will work best when gene flow between populations is restricted to a relatively short time period. The relevance of this assumption will depend on the species and the populations considered."

UPDATE2: Residual plot for 10 migrations rooted with the San-nb.

54 comments:

  1. That's very interesting, Etyopis. Thanks.

    I'm still learning to appreciate this "gadget" so I do not have much to say, other than a West Eurasian control would have been interesting to compare with, specially as the African tree is probably the basal tree of Humankind as well.

    It's clear in any case that most West Africans and all North Africans form clear compact clusters, while the rest seem more "diffuse".

    ...

    Related because of Treemix but not directly with the content of this post:

    Also, I asked this to Dienekes but did not get any answer: when a West Eurasian migration to North African appears to weight 73% (with zombies, from SW Asia) or 70% (with real populations, from Sardinia to Mozabites), does this mean that c.70% of North African ancestry is West Eurasian? I understand that from the explanation in the paper (fig. 1, where the root has a weight 1-w, in this case 1-0.7=0.3) but being a total 'noob' with this program, I'm in serious doubts anyhow. If so, I'd expect the algorithm to hang the NA population/zombie from the West Eurasian node and show the migration the other way around (as the weight is smaller) but it does seem further down the paper that such alterations are possible (errors or limitations of the algorithm it seems) and the illustrate with some examples.

    Do you have an opinion on this?

    ReplyDelete
    Replies
    1. Maju, I am also like you just learning about this software so I can't unfortunately give you much more insight, but one thing I don't understand however is what is the point of using TreeMix to analyze ADMIXTURE components? the two types of analysis should be complimentary, and I see no need to analyse the components generated by ADMIXTURE using this software, as your results would basically be an inference of an inference, in other words, you would be twice removed from the direct populations sampled than when just using the actual populations to infer the ML trees.One could off course look for patterns of congruency between the outputs of the two software, but that is a whole different matter from what you are describing with these 'zombie' or whatever-else data that comes as an output from ADMIXTURE.

      Delete
  2. Interesting! Thanks for taking the time to do this.

    ReplyDelete
  3. "... what is the point of using TreeMix to analyze ADMIXTURE components?"

    You know how Dienekes is... I generally prefer real populations over zombies, unless maybe when comparing individuals or some other special exercise for which a fixed conceptual frame can be useful.

    I'm pretty sure that I saw another TreeMix showing Sardinian flow to Mozabites at the same c. 70% level but can't find it anymore even with the help of browser history (has D. changed the content of his posts or am I going senile early and fast?)

    Guess I'll have to learn and design my own exercises.

    ReplyDelete
  4. Just updated for 10 assumed migrations.

    ReplyDelete
  5. The point of using ADMIXTURE components over "real populations" is that real populations include admixed individuals/outliers, etc. and result in the inference of phantom migration edges.

    It's quite different if there are 2-3 admixed in individuals in a dataset of N=20 vs. low level admixture at similar levels in all 20. TreeMix, since it works with allele frequencies, rather than individuals, would fit a migration edge in both cases. By using ADMIXTURE components, this layer of recent admixture events is removed.

    ReplyDelete
    Replies
    1. The input file of TreeMix uses the compounded minor and major alleles of groups of individuals, and so the computations are done on these groups from the beginning, which would essentially harmonize the results of any of the outliers in the group that may have allele frequencies of a high spread from the mean/median from that group. ADMIXTURE on the other hand, receives no group information prior to computing the assignment of the individual samples into its assumed (K) discrete populations . So by performing TreeMix computations on ADMIXTURE generated components, you are not actually removing a layer of 'recent admixture', but rather adding an extra layer of assumption. Sample size off-course has a direct impact on the confidence of the results produced from a certain dataset, however, this would hold true for both cases of TreeMix and ADMIXTURE.

      Delete
    2. It's quite simple:

      If a population has e.g., 1 half-Somali/half-Arab and 9 Arabs

      or,

      It has 10 Arabs, each of them receiving 10% of their genes from Somalia from a very old admixture event

      Then, TreeMix will infer a Somalia->Arab migration edge with a strenth of 10% in both cases. Obviously, the two are extremely different.

      (Not saying there is such an event, it's just an example)

      By running ADMIXTURE first, we remove this layer of admixture.

      Delete
    3. “By running ADMIXTURE first, we remove this layer of admixture.”
      Then you are cleaning your dataset to have a more uniform/homogenised groups of samples, that is fine. But that is not what you are doing, or said you are doing, you are using the output from ADMIXTURE as an input for TreeMix, this is the point of contention, the point on which it seems as though you are moving the goalpost on. You can 'clean' your dataset all day long if you want, but only as long as you are using the cleaned dataset, of individuals in ADMIXTURE's case and groups of individuals in TreeMix's case, as the requisite INPUTS for both software, the outputs stand on their own, and should be analyzed on their own, looking for congruency between the two outputs is a whole different ball of wax.

      ADMIXTURE would not be the only way to 'clean' a dataset either, you can run a PCA plot on the dataset and triangulate individual samples in genetic space within certain parameters and remove the individuals who do not fall into the deemed space, or you can even run a simple old Identity By State comparison of the individuals in the dataset and remove those that do not fall within a certain cut-off of a comparison distance.

      Delete
  6. My take on this is that the program is not yet sophisticated enough to accommodate for bidirectional migrations that have happened for thousands of years, like the ones that have taken place between East and North Africa for instance.

    You would almost certainly see sub-Saharan migration into Egypt if you had included Eurasians. In your analysis, Egyptians are the closest thing to a Eurasian sample, so there's not much of a need to weigh in the influence of migrations from SSA.

    ReplyDelete
    Replies
    1. That comment was made in regard to the fact that the program was inconsistent in the fact that it missed pointing out an Egypt --> EtO connection when it picked up an Egypt-->EtA, EtT, EtJ and even Maasai connection, unless off-course you think that the North African component in the aforementioned samples is different in Nature than the one that is present in the EtO samples, I have seen no evidence that would make for such a case.....

      Delete
  7. 0.154406 moroccans → san
    the outer epicanthic fold connection that endogenous North Africans share with the San ?
    the migration must be very ancient...I believe...unless it was an error .

    ReplyDelete
    Replies
    1. It looks like it is picking up the minor Eurasian (likely European) admixture that the San samples from henn may have and amplifying it, see here and here, unlike the San from Namibia samples that again come from henn(2011), that is why I also rooted the tree with the San from Namibia.

      Delete
  8. probably so , but why Moroccans the most endogenous and "secluded" of the Northern group and not the Egyptians the most Eurasian and within the vital Nile migratory route to East Africa and beyond ?
    could the old basal alleles of the hybridized Northwest component be responsible for the strange results ?

    ReplyDelete
  9. good question, but the Eurasian type of ancestry between egyptians and Moroccans may also be different in type. I'm not fully discounting an old connection between Morrocans and the SAN, Cruciani found some of the most basal YDNA in North and Central Western Africa, so the connection could be there, but the fact that the there was a difference in the Autosomal profiles between the Namibian San and the SAfrican San, makes me suspicious.

    ReplyDelete
  10. it seems that there are plenty of E Y-chromosomes in SAN groups...

    http://2.bp.blogspot.com/_Ish7688voT0/TO-pMuAmfNI/AAAAAAAAC6w/U3hRZNYQtnY/s1600/southafrica.png

    it seems that Moroccans picked up the Eurasian input for the reason I stated early...
    I think if you introduce a West Eurasian sample and repeat the experiment, the migration edge scores will change somehow...

    ReplyDelete
  11. The E1b1b1 input in South Africa is most, if not all, E-M293 (E1b1b1b2b) with a likely genesis in East Africa, the paper :

    "Development of a single base extension method to resolve Y chromosome haplogroups in sub-Saharan African populations"

    from which you cited those YDNA frequencies for, did not test for E-M293 :

    "Those Y chromosomes assigned to haplogroup E1b1b1 were screened further using the multiplex assay, Hg-E1b1b1, which consisted of the markers M78, M148, M81, M107, M165, M123, M34, M136 and M281",

    of those; M81,M107 & M165 are the only ones that are really associated with N.W. Africa. So if you are speculating that the Morroco-->san connection that treemix found is via a node higher than E-M81, which most of those nodes are to be found in the Eastern Part of Africa, but then why did the program not find at 10 migrations an Eastern Africa --> San connection, although it did so at 4 migration assumptions (see post of this thread), i.e. Ethiopia --> San.

    ReplyDelete
  12. Some South African San outliers have recent European ancestry from Afrikaners and Cape Coloureds who live in the Cape region. Therefore it is not surprising that TreeMix picked up a migration from Morocco (who have European alleles via Iberian input) to the San.

    ReplyDelete
  13. @ Eze

    I am not convinced it is the case,are you sure that the recent ADMIXTURE is having an effect on the results ?
    if so , the whole exercise is worthless ..

    @ Etyopis

    "which most of those nodes are to be found in the Eastern Part of Africa, but then why did the program not find at 10 migrations an Eastern Africa --> San connection, although it did so at 4 migration assumptions (see post of this thread), i.e. Ethiopia --> San."
    you asking me ?! it is your run ,you are the one who should come up with a convincing interpretation ...
    I don't know what that Eastern Africa --> San connection was about ..."Lucy" ? Omo ? or what the whole African trunk represent, the AMH or a much deeper evolutionary ancestry ? you tell me ..

    ReplyDelete
  14. AMH, Dalouh, AMH.

    There are several elements in an Eastern-Southern Africa connection surely:

    1. It appears to my eyes that the L0 clan (to which Bushmen essentially belong) originated in East Africa (Lake Victoria area?) and left its legacy in East Africa and surely also Arabia Peninsula.

    2. There was a paper some time ago arguing for a secondary pastoralist migration from East Africa into the South (as you may know the Khoikhoi or Hottentots used to be pastoralists at the time of arrival of Europeans and Bantus). The amount of genetic flow would have been small (but detectable), in a clear case of Neolithic diffusion without replacement.

    As for the Treemix debate in general, I strongly suspect that the algorithm is bugged. Otherwise Dienekes' results (c. 70% West Eurasian admixture into Yoruba!?) should not exist. So I'd take all with a good pinch of salt until the method gets well tested (and maybe fine-tuned).

    But I can't figure what's up with Morocco: all the arrows I see heading to the San spawn from the root of the North African cluster, which, if not identical, should at least be close to that of West Eurasians in general, of which North Africans are largely a subset (with aboriginal African blood but not as dominant). So I'm essentially agreeing with Eze in that this may well be nothing but minor European admixture in one of the San samples - just that I think it reflects the West Eurasian dominant ancestry of North Africans and not just the Iberian immigration, which would be only part of it - that's why the arrow stems from the North African root (i.e. it's some other related branch in fact or generic admixture from the whole cluster).

    ReplyDelete
  15. @ Maju
    "AMH, Dalouh, AMH. "
    but at what evolutionary stage(s) ?

    "As for the Treemix debate in general, I strongly suspect that the algorithm is bugged. Otherwise Dienekes' results (c. 70% West Eurasian admixture into Yoruba!?) should not exist."
    may need some refinement, but bugged ?
    I am actually pleased with Dienekes's results regarding the composition of the NorthWest component ..
    and this what I said a while ago :
    "anyway, I have the same reservations on the reconstruction of the Mechta Afalou man by Elisabeth Daynes....Because it was Assumed that Northwest Africa was not occupied by its natives....the Afalou man was most likely a mix and not a West Asian looking one...

    http://s1.zetaboards.com/anthroscape/topic/2264152/1/ "

    http://forwhattheywereweare.blogspot.com/2011/06/homo-sapiens-childs-remains-found-in.html
    thats right a MIX !

    it was 64% for the Yoruba and 73% with the Mozabites.. and both populations are dominated by the E..where is the mystery part ?
    it will be interesting to know if that 73% will hold or change with the South Moroccans /Atlasians ...as your North African ADMIXTURE experiment showed at K=11 a 14,4% of a distant pre-Dabban component...

    http://4.bp.blogspot.com/-qpX29zXLVfc/TvyVB5WETPI/AAAAAAAAAxY/ZGJBT8BXevk/s1600/FstTable.png

    http://forwhattheywereweare.blogspot.com/2011/12/north-african-genetics-through-prism-of.html

    why are the Ethiopians the least distant ?

    ReplyDelete
  16. "but at what evolutionary stage(s) ?"

    Not sure what you mean: fully evolved Homo sapiens (but before the OoA for phase 1).

    "may need some refinement, but bugged ?"

    When you have a new software and extremely strange results, there is strong reason to distrust.

    Surely bugged, yes.

    "it was 64% for the Yoruba and 73% with the Mozabites.. and both populations are dominated by the E"...

    I don't even imagine how "the E" (Y-DNA E, I presume) can justify anything like that, much less when it's a minor lineage of clear African origin among Europeans and West Asians.

    What is clear is that Yoruba are NOT 68% Vasco-Sardinian or otherwise West Eurasian (nor vice-versa), so something very bad is happening to that software, because it is producing results that are not consistent with anything we know. Sadly Dienekes' wishful thinking has led him to believe and defend this bugged result but it's so obviously wrong that, even knowing how Dienekes tends to be biased and wishful thinking in so many things, I am astonished: he should be full of doubts and instead he does not even blink.

    "why are the Ethiopians the least distant ?" [in my Fst results with ADMIXTURE for North Africans-plus]

    IMO, and I said in text, both the Ethiopian and Fulani components show ancient WEA-African admixture, old enough to have been settled in a single component, apparent only at certain K levels.

    Lower K levels show the admixture rather obviously: at K=9 Ethiopians still showed up as roughly 2/3 "Arab" 1/3 "Mandenka", while at K=7 Fulani show up as a mixture of roughly 2/3 "Mandenka" and 1/3 "Sahrawi". At those levels, the Arab and Sahrawi component appear as West Eurasian (close per Fst to other WEA components, distant from Tropical African ones), while the Mandenka component appears as clearly distant from WEA components, hence "aboriginal African" (previous to the "Aurignacoid" back-migration).

    Both components appear as mixed but while the Fulani's closest component is the Mandenka one, the Ethiopian closest component is the Moroccan and then the Iberian and Arab ones, reflecting the different levels of apparent admixture or whatever it is (Etyopis said it might be proto-Eurasian/East African affinity rather than only Eurasian back-flow, neither have tested this yet).

    But in no moment there is meaningful WEA ancestry apparent in the Mandenka nor Mandenka ancestry in WEA populations out of Africa. And this result is consistent in many other analysis, for example Henn 2012 (who worked with different samples, including Yorubas and Basques, getting very similar results to mine).

    The problem is that extraordinary claims require extraordinary evidence and so far Dienekes' "evidence" only casts doubts on the methods themselves, including the TreeMix software (what may be useful to know but it's bad for proving anything).

    ReplyDelete
  17. "Etyopis said it might be proto-Eurasian/East African affinity rather than only Eurasian back-flow, neither have tested this yet)"

    Well, there is nothing to test, and it is not me who have said it but geneticists, anyway, you already know where West Eurasians originated. Nobody has the tools to discern yet which autosomal genetic signature in East Africans is west Eurasian or aboriginal African, as of now, only uniparental markers have the ability to discern that, Ethiopians have on average ~ 80% aboriginal African Y and mtDNA markers, mtDNA haplogroup M, that is prevalent in East Africa is not really 'West Eurasian' by any stretch of the imagination, it is likely an OOA marker, that happened shortly after the initial migrations. The reason West Eurasians are closer to Ethiopians, than the remaining further distant OOA populations, or further distant African populations, is partly because Africans never stopped breeding with the OOA populations once they left Africa, Li and Durbin concluded that Africans were breeding with the OOA population for 40,000 years after the initial migration before West and East Eurasians even split, we have further YDNA marker evidence that originated in Ethiopia (E1b1b) 20000 years ago and spread out of Africa to the 'West Eurasian' areas. So I think it is a mistake to think that Africans stopped breeding with the OOA populations after they left, and some how these populations became 'pure' West Eurasians just because they left Africa, no, the breeding exchange happened and it was likely continuous and hardly discrete.The other part is off-course, that genetics is largely a function of geography, East and North Africa are closer to West Eurasia than other parts of Africa just by virtue of Geography.

    ReplyDelete
  18. ''mtDNA haplogroup M, that is prevalent in East Africa is not really 'West Eurasian' by any stretch of the imagination, it is likely an OOA marker, that happened shortly after the initial migrations.''

    M1 is dated to about 27 kya, while the OOA event occurred 70-60 kya. That's a considerable time gap. So I doubt it is simply an aboriginal OOA remnant, because of the considerable time gap and complete lack of other basal M lineages. Also, researchers have linked the initial spread of M1 to have occurred alongside U6 from the southern Levant.

    ReplyDelete
  19. Etyopis: the coalescence of the West Eurasian (macro-)population must have been through the following process (emphasis in the possible interactions with Africa):

    1. OoA: the proto-Eurasian population parts ways with the East African one and migrates towards (tropical and subtropical) Asia
    2. Eurasian diversification: This early Eurasian population diverges in several groups, notably South-West Eurasians and East Asians (also the various Negrito, Melanesian and Australian pops.)
    3. Aurignacian era: The West Eurasian core diverges from the South Asian one and migrates Westward, eventually driving Neanderthals to extinction and probably also making some inroads into Arabia and Africa (mtDNA M1, U6, etc.), although the exact extent of these is not clear.
    4. Probably a second WEA flow affecting North Africa arrives from SW Europe during the LGM. Maybe also some back-flow into Iberia (Y-DNA E-M81, mtDNA U6).
    5. The Afroasiatic expansion from East/NE Africa reaches also parts of Eurasia (E-V13 and such). This took place probably in the late UP (Capsian culture, African influences in Harifian, etc.)
    6. Possibly further flows from West Asia into North and East Africa (and certainly Arabia peninsula) with the Neolithic. Possible flow from North Africa to Iberia (option B for the arrival of NW African lineages to West Iberia, and even as far as some localities of Wales).

    That's how I see it. It implies some important episodes of back-migration to Africa, not all well documented archaeologically however but rather apparent in the genetics. The details are surely arguable but in any case there must have been some back-flow since deep in the Paleolithic co-influencing African genetics.

    "mtDNA haplogroup M, that is prevalent in East Africa is not really 'West Eurasian' by any stretch of the imagination, it is likely an OOA marker"

    That's not possible. M must have a single origin and the diversity in Africa is extremely low. Not just that, M1 can be easily demonstrated to be original of Asia because its "sisters" M20 and M51 (together making M1'20'51) are from Asia (M51 from Indonesia in fact, Min Peng 2007; M20 I think is from India but may also be from SE Asia).

    Considering M1 to be a remnant of the OoA is simply wrong. And as it's been mentioned there's certain parallel between the scatter of mtDNA M1 and that of Y-DNA T (important in The Horn) along the Indian Ocean's coasts.

    There is some backflow for sure. We don't know exactly the timing (because I do not think any relevant archaeology is known yet) but there was a small back-migration of Eurasians into the Horn and other areas of East Africa long ago.

    (And, sincerely, I hate to discuss this with Africans because I end up looking like the white guy trying to push some sort of neocolonial notion, what is totally out from my intent: I'm just stating the facts as I see them, please understand that).

    [continues]

    ReplyDelete
  20. [cont.]

    "Ethiopians have on average ~ 80% aboriginal African Y and mtDNA markers"

    For Y-DNA that's more than correct but not so for mtDNA, showing like 30-40% of Eurasian lineages (M1, HV and others).

    Still the affinity that Ethiopians appear to display towards West Eurasians is greater than that, so it's probably in part due to the fact of not being more akin to West Africans than to West Eurasians from the beginning.

    "Li and Durbin concluded that Africans were breeding with the OOA population for 40,000 years after the initial migration before West and East Eurasians even split"...

    Haven't read it, sorry.

    But with whom exactly and how? Because once the migrant population left Arabia and entered into Asia proper (Pakistan, India and beyond) there's no way there could be any more contact until the back-flow of West Eurasians into West Eurasia first of all but also outflowing into parts of North Africa and whatever was left of the early OoA population in Arabia, etc.

    The OoA population soon branched out and was mostly at a long distance from Africa in any case, so this claim looks difficult to conciliate with the expanding and diversifying dynamics of the early Eurasian population. There would be contacts with specific sub-branches of the OoA population but not all. These specific sub-branches can be two: (1) the remnant OoA population of Arabia (or Fertile Crescent if you prefer that model), which was surely very small and rather closer to Africans than to the bulk of Eurasians, who were being redefined by founder effects (= bottlenecks) and (2) the West Eurasian population which surely migrated westwards from South and SE Asia c. 50 Ka ago and whose most distinctive trait appears to be Aurignacoid industries (early UP).

    "we have further YDNA marker evidence that originated in Ethiopia (E1b1b)"

    I do think that E, E1, E1b, E1b1b... originated all in Africa. The expansion of these lineages may have displaced some Eurasian Y-DNA (I think this is correct in North Africa and may be also the case in parts of East Africa, where there is more Eurasian mtDNA than Y-DNA).

    "So I think it is a mistake to think that Africans stopped breeding with the OOA populations after they left, and some how these populations became 'pure' West Eurasians just because they left Africa, no, the breeding exchange happened and it was likely continuous and hardly discrete".

    I instead think it was more discrete because there was a migration to Southern Asia first, well documented archaeologically and genetically, and only later the Aurignacoid back-flow happened. In fact:

    - c. 125 Ka ago: First indications of OoA in Arabia and Palestine (but possibly not beyond)
    - c. 90 Ka ago: much more general OoA presence in Arabia
    - c. 80 Ka ago: African-derived MSA-like industries in South Asia, soon also stone blades (defining West Eurasian UP later on)
    - c. 55 Ka ago: Homo sapiens with Aurignacoid (UP) industries in Palestine, c. 48 Ka Aurignacoid industries in Central Europe, some time around those dates (c. 40 Ka or so) in Altai and the Pyrenees and Libya (Dabban industries)

    If you need references feel free to ask, but it's all documented in my blog: Petraglia, Armitage, Rose, etc.

    There was a separation and then a "reunion" (sounds nice but maybe they killed each other in part, I do not know).

    ReplyDelete
  21. "Sadly Dienekes' wishful thinking has led him to believe and defend this bugged result but it's so obviously wrong that, even knowing how Dienekes tends to be biased and wishful thinking in so many things, I am astonished: he should be full of doubts and instead he does not even blink."

    this new tool just confirmed his theory..I don't think it is some wishful thinking as you are trying to describe it..
    I find his argument very convincing, as this one at Razib's blog :

    http://blogs.discovermagazine.com/gnxp/2012/03/we-are-all-sardinians/

    "Exceptional claims require exceptional evidence, and this is not it at all. "

    Henn et al paper about North Africans IS the evidence..the Afalou (27 ky) and the Ibero-Maurusian remains (16 ky) from Ifri n Ammar seen here in this video :
    http://www.youtube.com/watch?v=VSKaa1Uh-h8

    are the direct descendants of the Eurasian DE.. and that is why North Africans cluster with other Eurasian groups...what other extraordinary evidence do you need Maju ?

    @ Etyopis

    don't make big assumptions on who were the proto-Eurasians...

    argiedude said

    "he Batini study? I wish I could've been able to give them a few personal opinions before they started the tests. The most interesting thing I found of y-dna B is the existence of a cluster with a very distinctive haplotype, namely the presence of 392=13, almost unheard of outside of y-dna P (Q, R1a, R1b, etc.). And this cluster is located geographically almost on the northern half of the Sahara, including Morocco and Egypt. But incredibly, despite finding 2 dozen samples, not one of them had tested any downstream SNPs, just B. when the Batini study came out, I thought for sure the mystery would be settled, but they found just a single sample from this cluster... and they didn't test it (for downstream SNPs)!"

    http://forwhattheywereweare.blogspot.com/2011/05/major-upheaval-of-human-y-dna-phylogeny.html

    ReplyDelete
    Replies
    1. Henn 2012?! How can that back the spurious "finding" (bug or artifact) of Yoruba being 68% Vasco-Sardinian? Or how can the Ifri n'Amar findings support anything that has to do with Nigeria?

      The Treemix algorithm is only now being widely tested and is producing some results that clearly look spurious on light of all other known genetic data. Why? No idea but there's obviously some problem with it.

      Also Y-DNA B has nothing to do with proto-Eurasians apparently (not a single B lineage in Eurasia, all is CF'DE).

      Delete
  22. the genetic profile of the North Africans supports the Eurasian origin of the DE ..and that is good enough evidence for me..
    "The Treemix algorithm is only now being widely tested and is producing some results that clearly look spurious on light of all other known genetic data. Why?"
    genetic data need to be supported by the Archaeological data...it is a simple rule.
    while NorthWest Africa has both , South Africa is lacking the Archaeological evidence ( no AMH remains)..and no excuse here, both are wine producing regions today because of their mild climate (unlike tropical Africa )..
    we may see more surprises in coming years , be ready...

    ReplyDelete
  23. "the genetic profile of the North Africans supports the Eurasian origin of the DE ."

    Y-DNA DE? No way! The origin of DE is only determined by itself and, in this equation North Africans are very much secondary: Tibetans and Japanese are in fact more intriguing. North African (and West Eurasian) DE (E1b1b1 variants in fact, nothing else) comes from the area of the Upper Nile or other parts of Tropical Africa, where most of the diversity is (DE* for example is found in West Africa - I always assume that the two DE* individuals reported once in Tibet are pre-D rather than truly hanging from the common origin of both D and E, but the matter is not well researched).

    You are wishful thinking like Dienekes: daydreaming because of unspoken racial prejudices (I understand). You two cannot embrace naturally our (minor) "recent" African ancestry so you'd like to make it all "Eurasian" somehow. That's not scientific but ideological and of a quite bad ideology in fact.

    "genetic data need to be supported by the Archaeological data...it is a simple rule".

    True in the sense that genetic data should not contradict overwhelming archaeological data but be conflated one with the other in order to provide the truest possible solution to the puzzle of human origins. Not true in the sense you claim:

    "South Africa is lacking the Archaeological evidence ( no AMH remains)"

    There's strong archaeological evidence of presence of H. sapiens in Southern Africa since >100 Ka ago. No skulls? May be but lack of evidence is not evidence of lack. The archaeological data does not contradict the genetic data in any case it just fails to produce even stronger support.

    Mind you that the genetic data in North Africa does not support apparent continuity from Djebel Irhoud. I have found what seems to be a small remnant or three of Aterian age (at the most) but not from earlier times. Maybe there is something but so thin that it's almost impossible to detect.

    In any case these remnants would link to Tropical Africa as direct ancestor: NW Africa is not some other planet: its patterns are related to the rest of the World and very much so in fact. It is rather a place where many migrations have ended and where probably not a single major migration began. Prove me wrong in this if you can: minor flows have sprang from NW Africa in Northwards and Eastwards and even Southwards direction - but did not reach too far nor was overwhelmingly dominant ever. That's what the genetics say: Jebel Irhoud is not more our direct ancestor than some random Neanderthal.

    "we may see more surprises in coming years , be ready"...

    We'll see what is to see in due time.

    ReplyDelete
  24. “Considering M1 to be a remnant of the OoA is simply wrong.”
    Considering ANY haplogroup outside of Africa to be a remnant of OOA is not wrong at all, it is simply a fact. Even if you consider the Ethiopian specific M1 to have coalesced 30 KYA outside of Africa as, we have evidence, like I pointed out to you from Li and Durbin from studying complete diploid genome sequences that the West and East Eurasian populations did not even genetically diverge at that time. They may very well have diverged physically by that time but signatures of what we identify today as West Eurasian and East Eurasian Autosomal genetics may not even have appeared at that time, therefore trying to use M1's presence in Ethiopia as evidence to back up the seemingly West Eurasian genetic affinity found in Ethiopians in ADMIXTURE runs is dubious at best.

    “In summary, the existence of long segments of low divergence between YRI1 and KOR supports the inference from PSMC that there was substantial genetic exchange between West African and non-African populations up until 20–40 kyr ago, and is not consistent with a simple separation approximately 60 kyr ago.”

    “Notably, a recent study using an orthogonal type of data (analysis of allele frequencies) also inferred that gene flow between Africans and non- Africans continued well after the initial out-of-Africa migration: in the case of that study, until 17–26 kyr ago25."

    ^ Li & Durbin (2011)

    “but there was a small back-migration of Eurasians into the Horn and other areas of East Africa long ago.”
    We have uniparental evidence for back migrations, but we do not have any evidence of what the Autosomal affinity of these OOA populations that back migrated could have been, for instance they could, possibly, have been more African-like than Eurasian like autosomally back then.

    “For Y-DNA that's more than correct but not so for mtDNA, showing like 30-40% of Eurasian lineages (M1, HV and others).”
    Even with your estimates that would make for Ethiopians possessing ~70-75% Aboriginal African haplogroups on average, which is only 5-10% off my ~80% average estimate.

    “c. 125 Ka ago: First indications of OoA in Arabia and Palestine (but possibly not beyond)
    - c. 90 Ka ago: much more general OoA presence in Arabia
    - c. 80 Ka ago: African-derived MSA-like industries in South Asia, soon also stone blades (defining West Eurasian UP later on)
    - c. 55 Ka ago: Homo sapiens with Aurignacoid (UP) industries in Palestine, c. 48 Ka Aurignacoid industries in Central Europe, some time around those dates (c. 40 Ka or so) in Altai and the Pyrenees and Libya (Dabban industries)”

    Most of those pre-70K populations could have very well died out, the most recent and up-to-date molecular clock based on mtDNA mutations has ruled out the possibility that the ancestors of all humans alive today exited Africa before the toba eruption. I know you have a problem with these time estimates, which your final or root reasoning boils down to pushing back the human chimp divergence time by millions of years, but this position of yours is contrary to the Academic orthodoxy, and I neither have the desire nor the knowledge to challenge it, but suffice it to say that I accept the Academic orthodoxy on this matter.

    “@ Etyopis
    don't make big assumptions on who were the proto-Eurasians... “

    By the way, did you see the supervised Global K10 run? North Africans had ~40% Ethiopian like, ~50% Basque like and ~10% Dogon like affinities. Peninsula Arabs and even delta Egyptians had less Ethiopian like affinities than far NorthWest Africans, why is that? Although the Arabian peninsula is closer to Ethiopia than Northwest Africa, and the Nile is a highway that connects Ethiopia to Egypt.

    ReplyDelete
    Replies
    1. I already replied to you by email (thanks again for the copy of Li & Durbin) that the paper seems inconclusive (and so extremely technical that its hard to criticize or accept as valid critically without being a super-expert). Very specially my greatest issue is the production of alleged molecular clock age estimates (without any sort of confidence interval), which are just too easy to be wrong. If their "20 Ka" is actually 80 Ka (and I see no reason why it would not be), then what they'd be describing would be the pre-OoA almost total genetic identity of all humans in East/Central Africa.

      Otherwise the model seems roughly correct: the Eur lines diverge from YRI later than the East Asian ones, what is consistent with late re-admixture in the Western Old World (I doubt the model would be able to pick the difference with any certainty).

      "Even with your estimates that would make for Ethiopians possessing ~70-75% Aboriginal African haplogroups on average, which is only 5-10% off my ~80% average estimate".

      I usually give more importance to mtDNA. Years ago, when algorithms like ADMIXTURE were not so popular, I figured out the 'rule of thumb' that, in order to estimate the overall admixture, one should consider that the mtDNA side weights double or even triple than the Y-DNA one. It's not scientific but it tends to be roughly correct.

      "Most of those pre-70K populations could have very well died out"...

      And replaced by what? Just because one random genetic modeling age estimate says T, it does not have to be right. I do not think it's so easy to replace populations (although it's surely possible to some degree in certain cases) and, anyhow, a model of replacement and a putative origin of the new populations would have to be ascertained and reasonably demonstrated, at least as plausible.

      I'm a bit tired of reading claims of mass replacement by populations originated nowhere of whose existence there is not even any evidence. We could imagine that those populations were replaced all through Eurasia and Australasia (sharing the same lineges, all populations must have been erased simultaneously from Europe to Japan and Australia) but then where is the evidence of the new population which replaced them? I mean other than in some esoteric modeling of a letter (not even a full fledged paper) which was laying on some desk for two full years before someone allowed publication.

      Allow me to be extremely skeptic. I think I have good reasons.

      "... but suffice it to say that I accept the Academic orthodoxy on this matter" [molecular clock].

      I do not think that there is an academic orthodoxy here, just academic self-complacency in many cases. There's a lot of papers that challenge the MCH but still many remain hopeful or just attached to old methodologies.

      Whatever the "orthodoxy" the molecular clock hypothesis has never been proven empirically and therefore remains in the realm of mere unproven hypothesis, unlike radiocarbon and other dating methods of Archaeological science. Using and abusing MCH estimates may be academically "orthodox" (scholastic: I cite you, you cite me and together we control the pseudoknowledge for a time) but it's also pseudoscientific, specially when one builds all conclusions only on that.

      "Considering ANY haplogroup outside of Africa to be a remnant of OOA is not wrong at all"...

      It's a matter of word choices: I meant remnant of the process of OoA migration between East Africa (L3 node) and Southern Asia (M and N nodes). Of course all M and N are product of the OoA but I would not use the word remnant, which has the meaning of residue or subproduct: M and N are the main product instead.

      But of course it's a matter of word choices.

      Delete
  25. Also, I was realizing that we do have specific direct evidence which shows that the descendants of N were inhabiting Europe some 30,000 years ago, what is consistent with the no-massive-replacement model, which I spouse, and contradictory with the no-differentiation model until 20 or, in the case of Europeans, less than 20 Ka ago.

    We know and nobody questions it that an individual in Kostenki, Russia, was mtDNA U2 some 30 Ka ago, what implies that U2 itself but also its ancestors U2'3'4'7'8'9, U, R and N had coalesced previously to that date. In about the same same period (before 17 Ka ago) there are also U5, R0 (reported as HV(xH)), and some other R-derived lineages (of which one I think is an H subclade almost for sure) directly spotted via HVS-I in Europe.

    The less-than-20-Ka hypothesis of Li and Durbin is not sustainable on light of the actual empirical data. That much is the kind of confusion that trusting MCH age estimates can lead to.

    ReplyDelete
    Replies
    1. Thanks for the maps, I can not comment on that page, so I'll ask here, the first map that says 30-17KBP, that is based on carbon dating right?

      Another thing is that the Li&Durbin paper does provide a caveat for mutation rates, however, even with an inaccurate mutation rate they surmise that it was unlikely that the OOA and African populations just parted ways with no breeding exchange since the former's exit from Africa ~60 KYA:

      "An important caveat to this conclusion is the uncertainty of the per-year mutation rate of 1E-09 (2.5E-08/25). Although this mutation rate agrees well with the rates estimated between primates averaged over millions of years (Supplementary Information, section 3.1), generation intervals as high as 29 years per generation over the last few thousand years(23), and present mutation rates lower than 2.5E-08 per generation(9), are possible in principle. These factors could make our recent date estimates too recent, although it seems unlikely that such inaccuracies would be consistent with a date of final genetic exchange as far back as 60 kyr ago."

      Delete
    2. "the first map that says 30-17KBP, that is based on carbon dating right?"

      Calibrated C14, yes. All remains mentioned in that map are from Gravettian or Solutrean ages. You can always contrast with the original papers via Jean Manco's Ancient Eurasian DNA page, which lists everything methodically, including the reference papers and links.

      As for the modeled 60 Ka. divergence: they just placed those 60 Ka on their pre-existent chronology: IMO they are comparing apples and oranges and is a useless exercise. In other words if the "20 Ka" are in fact 80 Ka, as I estimate, then their modeled "60 Ka" would be 240 Ka., what is a pointless comparison, as there was no OoA c. 240 Ka ago that we know of.

      "this mutation rate agrees well with the rates estimated between primates averaged over millions of years"...

      But then the paleoanthropologists totally ignore the dates proposed by the molecular clock fanatics from the field of genetics and have in fact older ages for nearly all nodes, ages that generally tend to become older as new discoveries are made or unsustainable inconsistencies are denounced.

      For example, looking for something else, I just stumbled in Wikipedia with this:

      "The discoverers conclude that Orrorin [tugenensis] is a hominin on the basis of its bipedal locomotion and dental anatomy; based on this, they date the split between hominins and African great apes to at least 7 million years ago, in the Messinian. This date is markedly different from those derived using the molecular clock approach, but has found general acceptance among paleoanthropologists".

      Which is almost exactly what I discussed in relation to another similarly aged ancestor: Salanthropus tchadiensis and other posts (like this, this, this, this, this or this - not comprehensive but certainly an ample list of examples by which the MCH is questioned quite radically from two or more viewpoints: paleoanthropology and mutation rate measure itself).

      Delete
  26. "Etyopis said it might be proto-Eurasian/East African affinity..."

    It's no longer possible to deny or rather disregard the importance of indigenous East African genetic variation in reference to the relationship(s) between NE Africans and other important ancestral populations, especially Eurasians (specifically Western Eurasians).

    East Africans (Afrasan speakers, Southern Sudanese Nilotic speakers, and indigenous SE Hunter-Gather groups, i.e. the Hadze and Sandawe) are closer to Eurasians than other Africans (Niger-Kordofanian speakers, Central African pygmies, and the Khoisan) are to the later.

    This "Eurasian" cline in Africa, in reference to Eurasian affinity or vise-versa, is likely primarily related to the time depth of divergence between the respective ancestral African populations (I'll use linguistic terms in a rough association with these aforementioned AAPs) and the ancestors of non-Africans.

    In the order of oldest to most recent divergence date with the ancestors of non-Africans...

    Khoisan > Mbuti > Biaka > Niger-Kordofanian > Nilo-Saharan > Hadze-Sandawe > Afrasan in relation to non-Africans

    Take a look at the results of this Dinka sample from 23andme...

    http://i204.photobucket.com/albums/bb178/beyoku/Y.png
    http://i204.photobucket.com/albums/bb178/beyoku/Mtdna.png
    http://i204.photobucket.com/albums/bb178/beyoku/Paint.png
    http://i204.photobucket.com/albums/bb178/beyoku/Globalsim.png

    Independent PCA...

    http://i204.photobucket.com/albums/bb178/beyoku/Full_20120201123442BGA2.png

    ^ As you can see, this Dinka sample clusters in between Somalis and West Africans; his results are supported by Tishkoff et al. 2009 whom also sampled groups from South Sudan. It's clearly obvious that what ever's causing the Eurasian affinity in this particular Dinka sample, and the Southern Sudanese in general, isn't due to any admixture (or commonality) from/with any other particular source, either be it Eurasian admixture and/or other indigenous East African gene-flow by way of the African Horn or groups like the Sandawe. What ever the case, a closer similarity to Eurasia relative to other parts of Africa seems to be the norm among all groups in East Africa, either be it Nilo-Saharan, Afrasan, or Hadze-Sandawe.

    If we were to use the Dinka as a proxy for the African component of the Somalis for example, the African ratio would increase from ~50% African (with a West African proxy) to about ~67% with the Dinka proxy, it would then increase to about ~75% African if we were to use indigenous Hunter-Gather groups from SE Africa. It's therefore logical to assume that the pre-Western Eurasian admixed NE Africans would have been more similar to Eurasians than their SE African and S.Sudanese counterpartsa.

    Western Eurasian admixture is clearly playing a notable role in the genetic affinities of at least some NE Africans; most importantly NE Africans groups in the northern Horn of Africa, i.e. Eastern Sudan, Eritrea, and Northern Ethiopia, whom cluster away (in the direction of Arabia) from groups like the Somali and some Oromos who lack "excess" Eurasian and/or other African admixture.

    If I had to make an estimated guess Arabian admixture would peak in the northern Horn of Africa at about ~20-25%, where it would then decrease significantly to about ~10% among Somalis and other lowlander NE Africans.

    ReplyDelete
    Replies
    1. That's a very interesting meditation, Joshua. I think I can agree with all or most of it. The estimate using a Dinka proxy is a very good idea and your inferences on realistic admixture in the horn not being larger than 25% is probably quite accurate. Certainly it must be quite less than the 2/3 that I got with a West African reference, which MUST be distorted because of pre-admixture Eurasian affinity of Ethiopians (and some admixture of Arabs maybe). When I designed that exercise however I was hoping for a clearly defined Ethiopian (East African) component of some sort but I did not find it until an admixed (equidistant to WEA and West Africa) component showed up but I'm not satisfied either with that result.

      Delete
    2. "Eastern Sudan, Eritrea, and Northern Ethiopia, whom cluster away (in the direction of Arabia) from groups like the Somali and some Oromos who lack "excess" Eurasian and/or other African admixture."

      It is a genetic cline, which is not necessarily only in the direction of Arabia but also in the Direction of Egypt, it is part of the cline that moves in general towards the directions of Eurasians in genetic space, Principal Component 1 in a global dataset clearly demonstrates this. The Dr.Mcdonald PCA map that you have shown with the Dinka stops with Northern Ethiopians and is actually an inter African, or rather an inter subsaharan African PCA only, if extended to include the whole of Africa then Northern Sudanese, Egyptians, and Northwest Africans would continue in that same cline, i.e. Northern Ethiopians > N.Sudanese > Egypt, Morroco ,...>Yemen>.... until enough non-African samples are added at which point it would become a global PCA and the differentiation of North/East Africans from other subsaharan Africans vanishes, and the main differentiation becomes Africans from non-Africans.

      Delete
  27. "... until enough non-African samples are added at which point it would become a global PCA and the differentiation of North/East Africans from other subsaharan Africans vanishes, and the main differentiation becomes Africans from non-Africans"

    That's not correct and you should be familiar with global datasets in ADMIXTURE/STRUCTURE analysis. For example Behar 2010 (sup. figures, scroll down to fig. 4a) at K=3 has North Africans (Moroccans, Mozabites, Egyptians) looking 80% West Eurasian and some 20% (West-Central-Southern) African. Ethiopians also look that way although the apportions are different: like 60% and 40%.

    And that is a sample full of non-Africans of all kinds. And it is just an example.

    (And curiously enough, maybe because Africans are not sampled in sufficient numbers, the main distinction at K=2 in that case is East Asians versus the rest - just for the record because IMO this is a matter of sample size and less important admixture between West Eurasia and Tropical Africa).

    There are other examples and you can experiment with the 1000 genomes yourself and see if what you just said is correct or rather does not hold. And I think it does not hold in fact (based on all my genetic experience).

    ReplyDelete
    Replies
    1. What's not correct? I am talking about PCA, which is what the Dr.Mcdonald map that joshua.gatera posted was, who was talking about ADMIXTURE?
      It is a FACT that the first principal component separates Africans from non-Africans in a global data-set and the first principal component in an African data-set separates east/North Africans from every one else, are you disputing this??

      Delete
    2. "... are you disputing this??"

      No. I am disputing this:

      ... "the differentiation of North/East Africans from other subsaharan Africans vanishes".

      That does not happen in the PCA either, which is not that different from Bayesian clustering analysis (normally PC1 tends to resemble K1-2, while PC2 resembles K3, etc.) In fact, in global PC analysis North Africans usually cluster with West Asians, although with some tendency towards mainstream Africans.

      What I say is that, in all my experience, North Africans cluster with West Eurasians primarily (although with some deviation towards Africa). That's probably also the case of Horners but I'm less familiar with the specifics.

      Delete
    3. And again Behar 2010 offers an example (including Ethiopians) in this massive eigenvector analysis (legend here). North Africans and Ethiopians are closer to West Eurasians than to other Africans.

      Eigenvector 1 (equivalent to PC1) actually separates Africans (except Horners and North Africans) from Eurasians. But not radically all Africans from all non-Africans, "the differentiation of North/East Africans from other subsaharan Africans" does not vanish.

      Delete
    4. "Eigenvector 1 (equivalent to PC1) actually separates Africans (except Horners and North Africans) from Eurasians."

      Look Maju, I am sorry to say but you do not really know how to read a PCA (which I find odd, unless you are pretending or playing devil's advocate as if you don't know).

      A two dimensional global PCA plot approximates an 'L' shape, one side of the 'L' is where ALL Africans are found and the other side is where non-Africans are found.

      PC1 (one side of the 'L')on a global plot separates Africans from non-Africans, East Africans and North Africans sequentially lie on PC1 along with all other Africans. Non-Africans DO NOT lie on PC1, they lie on PC2 (the other side of the 'L') which separates East Eurasians from West Eurasians.

      PC1 on an African plot separates East and North Africans from other Africans.

      Hence, PC1 on an African plot describes a different set of Variations than PC1 on a Global plot, thus the separation of East/North Africans on an intra African plot from other Africans DISAPPEARS on a global plot on PC1.

      No need to talk about model based clustering analysis here, they are different but complimentary methods.

      Delete
    5. I hope you are just not wasting my time....

      Delete
    6. Not at all. You have a drastic misunderstanding here.

      The L-shape is a possible shape among many. I am sure you have seen many PCAs without an L shape: scattered clusters, single central cluster with some outliers, even almost parallel lines... that depends on the location that each sample will adopt.

      The global plot tends to take an L-shape but that's a peculiarity of the global plot (and possibly others), which shows a cline between Africa and West Eurasia and another cline between West and East Eurasia.

      In any case the two "lines" of the L mean nothing on their own (they are clines in fact), what means are the values each sample has in the X,Y axis.

      For example in the Behar plot all Eurasian samples are high in EV1 while all African ones (but North Africans and Horners) are very low. The EV1 therefore indicates the Africa-Eurasia duality. On the other hand the EV2 indicates the West Eurasian (and also largely African and South Asian) contrast with East Asians.

      It is the same in your example.

      Examples of other shapes:

      East Asian PCA focused on Korea.

      PCA of Modern and Neolithic WEA mtDNA, with a line/V/cluster for moderns and a separate cluster for the very anomalous Neolithic samples (annotated by me but from Balanovski's paper).

      An almost linear PCA of West African and Afroamerican Y-DNA

      A central cluster with outliers of Europeans, Iberians, Basques and some others. Notice how non-Europeans form a parallel line at the bottom.

      An almost cross-shaped PCA of West Eurasians and controls focused on Armenians.

      (cont.)

      Delete
    7. ...

      Y shaped Swedish PCA and the second PCA of the same paper after removing the samples likely to have Finnish ancestry, which I'd describe as a line with some outliers or "thin branches".

      a global PCA focused in Spain that is your "L" but so fragmented (for lack of sufficient samples) that is not recognizable easily anymore.

      In this interesting paper on the "Africanness" of Europeans and North Africans, there is a very interesting supplement 1 (which you'd have to download) which compares YRI (green), CEU (red) and a West Eurasian population (blue).

      In all cases YRI is at one extreme of the PC1 and West Eurasians in the other, with a huge void in between, but the interesting thing is in PC2 whis is in most cases defined only inside YRI, with the other West Eurasian sample clustering tightly with CEU. So for example there is less difference between CEU and Dutch or Swiss French than internally among Yoruba.

      But in some cases the intra-WEA differences are larger than among the Yoruba, for example with Basques or with Italians or even Russians. I included all those examples in this post.

      An irregularly-shaped PCA of Europeans and North Africans

      a two clusters - two corners PCA of Central Asian Y-DNA.

      a horizontal line with outliers: PCA of SE Asian and Chinese mtDNA.

      L and V shaped: Eurasians with Indian focus.

      abstract example of how uneven sampling distorts PCA (from a paper discussing the worth and the issues of PCA).

      "I hope you are just not wasting my time"...

      Not at all: you have a misconception here with that idea of the L shape meaning anything other than two clines that converge at a single point (in our case West Eurasia).

      Delete
  28. I never said all PCA's come in an 'L' shape, I said the first two dimensions of a Global PCA approximate an 'L' shape, that still remains the fact, it also remains the fact that on one side of the 'L', where the highest amount of the PCA's variation (eigenvalue) is explained, Africans are found, on the other side non-Africans are found, the corner of the 'L' houses those that are intermediate between Africans and Non-Africans, including some North Africans, Southern Euro and Near Eastern Populations, this is consistent with global genetic diversity decreasing as a function of distance from Africa.

    Since PCA's are essentially vector spaces with both magnitude and direction, their dimensions specify the number of independent directions in space, hence PC1,PC2,.... are roughly independent and also orthogonal. Eigenvectors are only a special case of vectors, where an orthogonal basis for such vectors, is required to explain the variance of the data for a square matrix, like for instance an IBS matrix with an N X N size will yield N orthagonal eigenvectors after decomposition.

    ReplyDelete
    Replies
    1. I must still disagree, sorry: the L shape is not any measure of anything but just a byproduct of an specific set of samples and their relative affinities as shown in the PC plot.

      The PC/eigenvector plot has two dimensions each one measuring a major set of genetic differences. In the global plot, almost invariably, one of the axes (eigenvector 1 in Behar's) separates Africans from non-Africans (with Ethiopians in between) and the other East Asians from West Eurasians (with Tropical Africans other than Horners tending to neutrality).

      Also vectors are often just a mathematical way of visualizing and operating points in Euclidean spaces or planes. Ex. (5,3) can be seen as a dot in a plane or as the vector, originating in (0,0) towards that dot. But the origin is arbitrary, so it's just a mathematical construct. It'd be different if the vector would go from (2,1) to (5,3), which are both specific non-arbitrary references. This should not confuse us.

      Additionally, as it seems somehow important in your confused discourse, both African greater genetic diversity and genetic diversity as function of distance from Africa must be carefully qualified:

      (1) African greater genetic diversity is largely between populations and is not so outstanding once we remove hunter-gatherers (Bushmen, Pygmy, Hadza...), which host most of that diversity (this means that there have been processes of homogenization and diversity-reduction within Africa probably in relation with the spread of Neolithic by specific populations). There are maybe more detailed studies but right now what I found is Rosenberg 2002, which in table 1 shows that Africans have lower intra-population diversity than Eurasians (but still greater than Native Americans and Oceanian peoples). (Note: what I'm saying in this point also probably needs to be qualified or debated in greater detail, mostly because Rosenberg 2002 does not include a single East African sample, but let it be an entrant anyhow).

      (2) But, most importantly, genetic diversity as a function of distance from Africa only really exists if the line measuring that distance goes through South Asia (i.e. Indians are more diverse than Europeans and West Asians), as explained by Melé 2011 (older papers typically ignored Indians altogether). I can send you a copy of this paper ("Recombination gives a new insight in the effective population size and the history of the Old World human populations") if you wish.

      Neither of these facts are apparent in the PCA, L-shaped or not. In fact South Asia is quite hidden in the middle-left of the West-East Eurasian cline and, unless you remove Africans from the sample, it only shows up its distinctiveness and relevance somewhat in PC3 and such. South Asia, even if it is the pivoting region of the OoA, does not look apparent at all in the PCA: we need other methods to unveil its relevance and centrality.

      Delete
    2. Indians don't generally have more genetic diversity than West Asians/Europeans, perhaps in some cases , but not really generally speaking:

      Private Alleles By Region

      Allele Size Variance

      Heterozygousity

      Source : Tishkoff (2009) Supplemental material, which used more microsatellite loci than the 377 used by Rosenberg (2002) and what cited by you.

      The rest of your 'points', I can't really argue with, since you are showing a wishy-washy position over the proven fact that Africans have more genetic diversity than non-Africans!, sorry, but that is just some basic stuff from over 10 years ago, it is like arguing with a person about trigonometry when the person is not even sure that 2+2=4, there simply is no point and we would be both better off moving on and not wasting our time.

      Delete
    3. "But, most importantly, genetic diversity as a function of distance from Africa only really exists if the line measuring that distance goes through South Asia"

      In addition you need to read your own sources carefully:

      "We also observe that the patterns of recombinational diversity of these populations correlate with distance out of Africa if that distance is measured along a path crossing South Arabia."

      Recombination gives a new insight in the effective population size and the history of the Old World human populations

      Delete
    4. Don't get me wrong: I'm not wishy-washy about African diversity but it does require some qualification (I'm being more nit-picky than anything else). Whatever the case African greater diversity is not any argument for the PC analysis: they are not really related because what the PC graph measures (or rather illustrates) is relative similitude and difference, just like the ADMIXTURE-like algorithms but with a different display.

      This has nothing to do with African genetic diversity, at least not in any straightforward manner. If Africans would be more numerically dominant in the sample, maybe some of the PC dimensions would show that African diversity (actually you get some of that in the EV1-3 graph of the Behar link above: EV3 is essentially a cline between the Mbuti and West Africans).

      The Tishkoff graphs are a great choice; I was actually looking for them but could not remember which was the source. There South Asians are also genetically most diverse and some of the blue-colored confusing pops are also South Asian, like Balochi, Sindhi and other Pakistanis, surely most important in the Eurasian genesis (others are Palestinians or Palestinian-Bedouins, who have known African admixture - or maybe you could argue to be ancestral to Eurasians???).

      Europeans also have some post-OoA African admixture, what surely confuses the matter a bit.

      "... a path crossing South Arabia."

      Even better (I knew that but had forgotten). After all what that paper does is to corroborate (apparently) the coastal route for the OoA.

      Delete
    5. "Whatever the case African greater diversity is not any argument for the PC analysis: they are not really related because what the PC graph measures (or rather illustrates) is relative similitude and difference"

      "This has nothing to do with African genetic diversity, at least not in any straightforward manner."

      It is true that PCA graph shows is basically the decomposition of a similarity/dissimilarity matrix like IBS or ASD into linearly independent components, however the genetic diversity decrease with distance from Africa, as another independent metric, also correlates (maybe not exactly 1:1 but close enough) with the path traced by the 'L' shape on a PC1 vs. PC2 plot of a global PCA . It is quite obvious if you think about it actually.

      "There South Asians are also genetically most diverse and some of the blue-colored confusing pops are also South Asian, like Balochi, Sindhi and other Pakistanis, surely most important in the Eurasian genesis (others are Palestinians or Palestinian-Bedouins, who have known African admixture - or maybe you could argue to be ancestral to Eurasians???)."

      For Allele Size Variance, this is partial of the decreasing Order of Eurasian populations:
      Bedouin > Balochi > Palestinian > Hindi > Pathan > Kannada > Punjabi > Temani > Makrani > Uygur > Telugu > Sindhi > Tamil > Hazara > Kashmiri > Brahui > Adygei > Bengali > Druze > Assamese > Parsi > Oriya > Konkani > Marathi > Italian..................

      For Hetrozygoisity, the order is:
      Bedouin > Kashmiri > Palestinian > Pathan > Sindhi > Balochi > Italian > Marathi > Russian > Uygur > Punjabi > Brahui > Tamil > Kannada > Adygei > Burusho > Assamese > Oriya > French > Bengali > Parsi > Hazara > Hindi > Malayalam.......................

      Delete