Below is a contour map representing the African ADMIXTURE component at K=2 for the Global data set (V2) which can be downloaded here, and population specific percentages that can be seen here.
Contour map generated using Mapviewer7, Kriging method was used for gridding. ADMIXTURE outputs for all New World, Jewish, Singapore-Chinese and Singapore-Indian populations were removed before the generation of the map.
African cline from ADMIXTURE, K=2 . Black dots represent locations of sampled populations |
Some things to note,
- Since this is a K2 run, the OOA or the 'other' component has a complete mirror distribution relative to the distribution of the African component seen in the above.
- The regions where the brown color dominates (20-35% African ) are the same regions that are later on absorbed by the new component that arises @ K=3, which finds its peaks in West Eurasians and has an FST that is intermediate between those of the African and East Asian/Amerindian components.
- It is notable to observe the congruence of the above with the distribution of global genetic as well as phenotypic diversity (below)1
Global phenotypic and genetic Diversity |
I've told you often that K=2 is pretty much worthless at this scale.
ReplyDeleteIt is obvious that you are in denial when it comes to this basic fact.
DeleteIt is not any "basic fact" but a statistical construct. And in fact it is a correct statistical construct (depending on the sample used always): because West Eurasians are neither Africans nor East Asians. But they are not a mixture of both either (they may have some minor influences from both but otherwise distinct and their own cluster, group or population).
DeleteIf you draw a map with the global K=2 of Behar 2010, you don't get the same contours at all. You get a West Eurasian vs East Asian cline and Africans (less represented in numbers) subsumed 100% into the West Eurasian half (for whatever statistical artifact).
You can still get a different K=2 cline if you reduce the number of East Asians, then the cline would be West Eurasia vs. Tropical Africa and East Asians surely subsumed in the WEA cluster.
You can in fact get all kind of K=2, get 200 Khoisan and 200 others and you'll get a Khoisan vs others K=2, etc. It depends largely (not only) on sample size.
“It is not any "basic fact" but a statistical construct. And in fact it is a correct statistical construct (depending on the sample used always): “
DeleteI have updated the contour map to show the locations of all the samples used in the K2 analysis, bring me a data-set with a better global sampling than the above and which doesn't show West Eurasians showing a ~1/3 African and 2/3 East Eurasian affinity then you have an argument, until then it is all just talk.
“because West Eurasians are neither Africans nor East Asians. But they are not a mixture of both either (they may have some minor influences from both but otherwise distinct and their own cluster, group or population).”
Apparently not, at least not relative to say the Amerindian, Oceanian or even South Asian clusters that form in ADMIXTURE runs when K>2, West Eurasian cluster is not as distinct as you'd like to think, it has significant African affinity, unlike all the others mentioned.
“If you draw a map with the global K=2 of Behar 2010, you don't get the same contours at all. You get a West Eurasian vs East Asian cline and Africans (less represented in numbers) subsumed 100% into the West Eurasian half (for whatever statistical artifact). “
I have done experiments on five different global datasets to see how the K=2 ADMIXTURE outputs vary, while you have done none, and none of them show West Eurasians forming their own pole, none of them at all. I have also presented several other datasets that were run by others, again it is ALWAYS a split between Africans and non-Africans. With all due respect Maju, I consider you to have a lot of knowledge in genetics, pre-history and many other things, but in this you are absolutely blinded by your desire to see West Eurasians as a distinct population, you are wrong.
“You can still get a different K=2 cline if you reduce the number of East Asians, then the cline would be West Eurasia vs. Tropical Africa and East Asians surely subsumed in the WEA cluster.
You can in fact get all kind of K=2, get 200 Khoisan and 200 others and you'll get a Khoisan vs others K=2, etc. It depends largely (not only) on sample size. “
Please show and THEN tell, don't tell me first without showing me.
Then you will have to wait because I don't seem to get enough time nor energies these days.
DeleteBut the tests I have already designed and told you how to do them above. A tricontinentally balanced sample like this one will likely produce these K=2 results because West Eurasians have minor African admixture and that partly compensates for their overall slightly closeness to East Eurasians, which is anyhow extremely ancient and not much more recent than their closeness to Africans - so WEA does not tend to create a cluster as easily as the other continental regions.
But just lower the weight of Africans (Behar 2010 example) or East Asians and you'll get different K=2 results. Or try to make a K=2 between 50 Khoisan, 20 Africans, 10 WEA, 10 EA and 10 SA and you'll get a Khoisan vs others K=2, etc. Sample size is critical and changes the results. And I see no particular reason not to take the Khoisan vs others as a the "real global K=2" because we know for a fact that Khoisan diverged first of all.
For example in Pickrell 2012, we get your cheered global L-shaped PC in which the angle of the L does not correspond anymore to West Eurasians but to mainstream Africans. Similarly at K=2, the two poles are French and the Northern Ju'Hoan. Should we conclude from that datum that most Africans are a mix of French and Ju'Hoan at different apportions? (Of course not but following your misleading "logic"...) How would your global map look with that data?
Petersen 2013 does not even show her K=2 (yeah, what for?) but her K=3 is equally (non-)informative, showing high "Chinese admixture" among the Maasai and Sandawe. Never mind that in that analysis French are almost 100% Chinese, unlike in yours (wow, at K=4 they are like 5% Hadza!!!).
Sorry to be getting a bit sarcastic... you know I respect you and in general I do respect your work but, really, global K=2 is absolutely non-informative.
I ask you to bring datasets that have more global coverage than what I have posted for this blog and you go ahead and bring datasets that have even more gaping holes in them than your original dataset (Behar 2010) which you somehow thought supports your point. It is almost as if you know the truth but something is holding you back to admit it, hmmm.....
DeleteTo say that K=2 is un-informative, is like saying the foundation of a house or a structure is unimportant to the remaining layers that are added onto the structure, which is off-course nonsense.
Here , for instance is a figure from a peer-reviewed paper that examines K=2 in Eurasia, naturally the primary split being between West and East Eurasia.
K=2 is not any foundation: it is just a preliminary draft, at least at the levels of diversity and complexity we are discussing here. K=2 is not more important than K=3 or K=8 or K=16 or K=33, the informativeness of these clustering allocation levels is given, at least according to the method, by the cross-validation values. And I am certain that for this global sample, K=2 has a horribly low cross-validation score.
Delete"Here , for instance is a figure from a peer-reviewed paper that examines K=2 in Eurasia, naturally the primary split being between West and East Eurasia".
Actually K=2 is only shown on a map. Regardless, that makes some sense because it is comparing two populations that are known to be relatively homogeneous: WEA and EA (there's one Keralite sample but it's not enough to form a South Asian specific cluster even at K=6 with that overall sample).
In fact in the last map, the authors correctly compare the Fst values of the K=6 components, showing that two components (Europe, West Asia) cluster on one extreme and three others (East Asia, Central-East Asia and SE Asia) cluster in the other, with the Central Asian/West Siberian component being intermediate. That K=6 with that Fst distance graph IS INFORMATIVE, confirming that we are before two quite homogeneous populations with an intermediate element of some interest, which is the apparently old mixture in West Siberia (Khanty)and (to lesser extent) Turkestan.
That this somewhat resembles the K=2 is just a coincidence, maybe by design of the authors, but a particular case anyhow.
IF you were doing something like that I would not be complaining at all. These guys compare apples and oranges but there are (with the Keralite possible exception, quite irrelevant) nothing that is not apple, orange or a mixture of both. Instead you compare apples with oranges and want the bananas to be described as function of those two, and that does not work.
So is the banana more like an apple or more like the oranges? That's your K=2. But you actually need a K=3 to allow for a banana-specific category, which is a must if you put bananas into the equation. Of course, it's not just West Eurasians (the bananas) but also Melanesians, Khoisans, Pygmies, South Asians, etc. you need a whole fruit shop to be able to describe all that diversity. So the cross-validation optimal is likely to be quite high.
“Regardless, that makes some sense because it is comparing two populations that are known to be relatively homogeneous: WEA and EA”
DeleteThis tells me that you are coming from the a-priori position that WEA and EA are homogenous populations on the same level of homogeneity, which makes your observations biased to your own peculiar understanding of global human genetic structuring. When however you decide to come from a position that is based on empirical evidence, such as one seen in the post above, then it is clear to see that EA are on a totally different level of homogeneity than WEA are, in fact, it looks as though the only thing that separates the genetic variation of East and West Eurasians is the fact that West Eurasians have African affinity, this makes sense from an overall OOA perspective where both East and West Eurasians are descended from a small group of Africans, and where the former's diversity, by a model of Isolation by distance, is dwarfed by that of the latter's, as the latter would still be proximate to African source populations.
The global K=2 analysis posted above IS a foundation, as it pretty much models the primary genetic division of the world, it is analogous to the first axis of a well sampled global principal component plot, which ALWAYS separates Africans from Non-Africans.
It's not "a priori" but described by the Fst distances.
DeleteYou never seem to deal with the issue of Fst distances, Ethio Helix! You should.
"The global K=2 analysis posted above IS a foundation, as it pretty much models the primary genetic division of the world, it is analogous to the first axis of a well sampled global principal component plot, which ALWAYS separates Africans from Non-Africans".
DeleteWhat is "a well sampled", why 200 Khoisan and 200 others is not a good sample? For instance...
We know from genetic distance values, Y-DNA and mtDNA that Khoisans are the first to branch out of all Humankind. Your K=2 (and many others) does not detect that. Why? Because Khoisans are terribly undersampled, so the internal differences of "branch B", so to say, overshadow their otherwise very clear distinctiveness vs all others.
But when we get a "well sampled" global analysis, in which Khoisans are not undersampled, then Africans become polarized between them and Eurasians at K=2, logically. This does not mean that Africans are admixed of Khoisans and Eurasians, just that K=2 at those levels is totally pointless.
“You never seem to deal with the issue of Fst distances, Ethio Helix! You should.”
DeleteI have and do deal with the fixation index, the few ADMIXTURE analysis I have done thus far have all been accompanied with the Fst distances of the components, with respect to this post, I even mentioned that the West Eurasian cluster that emerges at K=3 has an intermediate Fst distance between the African and East Asian components, consistent with the fact that prior to the cluster's formation, i.e. at K=2, Western Eurasians have assignments in both the African and East Asian components at proportions mentioned herein.
But what does that tell me? Not much, Fst may tell me the degree of differentiation of two populations, but it for sure does not say anything about the direction of gene-flow.
“What is "a well sampled"”
My Definition: A relatively well sampled global dataset has a minimum amount of gaps between all the populations that lie within, or close proximity to, the path traced by the 'L' shape in a global PCA plot.
Example of a relatively poorly sampled global dataset: 1
Example of a relatively Well sampled global dataset: 2
“We know from genetic distance values, Y-DNA and mtDNA that Khoisans are the first to branch out of all Humankind. Your K=2 (and many others) does not detect that. Why?”
Khoisan are first and foremost Africans, most genetically divergent Africans, arguably yes, nevertheless they are Africans first and foremost and cluster with other Africans in a dual decomposition of the world.
"Khoisan are first and foremost Africans, most genetically divergent Africans, arguably yes, nevertheless they are Africans first and foremost and cluster with other Africans in a dual decomposition of the world".
DeleteNot when you sample them sufficiently, as in 25% of the sample (what roughly corresponds to their share of Human diversity, at least measuring by mtDNA: 1/4 to L0d, 1/4 to L0a'b'f'k, 1/4 to L1 and 1/4 to L2-6). If you undersample them, as you do (and most others do also because they are not interested in this fundamental fact - but at least they do not dwell into K=2 fallacies), they are forced into some other cluster, which is of course African.
Critically I hint to the prejudice (pre-judgment) that underlines your repeated exercise of a "global K=2": that Africans are a unity vs the rest of the World, what really does not fit the facts: Africans, even after the relative homogenization of the Bantu expansion (and other less important ones) are the most diverse subset of Humankind, including nearly all the diversity, and certainly all the basal or ancestral diversity of the species (excepted only novel mutations, whatever alleles may have gone extinct and whatever was introgressed from Neanderthals, etc.)
So it's only logical that a well pondered sample would place K=2 between Khoisans and the less remixed L1-6 peoples, which may be well Western Pygmies... or Eurasians/Australasian Natives.
I have been chewing on what a well pondered sample of Humankind should be and I think it should have the following fractions:
1. Khoisans, representing the southern L0 (L0d) branch
2. Tribal Ethiopians or South Sudanese (?) representing the northern Lo (L0a'b'f'k) branch
3. Western Pygmies and Fulani representing the L1 branch
4. Other Africans representing the L2-6 branch
5. Depending of the size of the sample we can add for convenience a smaller representation of Eurasia and Oceania (Native Americans could be in but not strictly necessary) or keep it as part of the 4th fraction. This fraction should never be larger than any of the others and should include in balanced apportions:
5a. Indians, East Asians and Oceanians representing M
5b. East Asians, Oceanians and West Eurasians representing N/R
So for example:
1. 20 Khoisans
2. 20 Tribal Ethiopians/South Sudanese
3. 10 Baka and 10 Fulani (not from Sudan)
4. 10 Yoruba and 10 Maasai
5a. 3 Indians, 2 Chinese, 1 Cambodian, 2 Papuan, 1 French, 1 Iranian.
This would be a proper sample. But it's not too different from what I mentioned above about Pickrell's and Petersen's papers.
"the West Eurasian cluster that emerges at K=3 has an intermediate Fst distance between the African and East Asian components"
DeleteCan you give the exact values?
I'm quite impressed at the neat match of the K=2 map and the diversity maps.
ReplyDeleteI like how this evolves since last time I looked at your work.
ReplyDelete"Since this is a K2 run, the OOA or the 'other' component has a complete mirror distribution relative to the distribution of the African component seen in the above."
That's exactly right: so you have two potential origins: one in Africa (green), the other one in America (and parts of Northern Asia if it's the very same deep red as we find in most of the New World). I don't see anything in the data that suggests specifically that it's Africa. America is just as likely and considering that America is most diverse linguistically (as measured by the number of language stocks) and Africa is not, Miss America wins. A population scenario will be different from the one usually envisioned for out-of-Africa (not a serial founder effect, but rather isolation in the homeland and population growth plus admixture in the colonized areas) but so be it.