Ethio Helix ኢትዮ:ሒሊክስ: The World At K=2

Saturday, July 7, 2012

The World At K=2

The most basic Autosomal genetic division of the world is between Africans and Out of Africans (OOA), this is not only seen on global PCA or MDS maps , where the first PC separates Africans from non Africans, but can also be observed with model based statistical (Bayesian) Analysis as well, where the first model iteration, i.e. K=2 distinguishes Africans from non-Africans.

Here, I present (for reference) the full ADMIXTURE, K=2 results for a global dataset of 2,967 individuals from around the world, sampled for 16,595 SNPs with a total genotyping rate of 99.6%.

The results are arranged from the highest median African % to the lowest.

12 comments:

MajuJuly 7, 2012 at 12:28 PM
Most interesting, thanks.

Not sure how to read it but the cline is very clear.

However I'd doubt of considering the orange cluster the "OoA" one because if you do the same at K=3, when the West Eurasian component will be defined (K=2 depends a lot of sample sizes it seems), the matter should be more clear.

Why is this? I suspect that the Eurasian subdivision was so quick after the OoA and West Eurasians had some minor African admixture complicating things, that West Eurasians sometimes, depending on sample sizes, end up clustering with Africans (or rather Africans with WEA, because it's more common to undersample Africans in fact) and not East Asians.

Another issue is that the huge African variability becomes blurred resulting in only a few hunter-gatherer populations showing up as 100% blue, what contrasts with the many East Asian ones showing up as 100% orange (because East Asians are much less diverse).

So I advise much caution.
ReplyDelete
Replies
German DziebelJuly 11, 2012 at 12:13 PM
I think it's pretty clear from K=2 that American Indians have the non-African component at highest frequencies. Just like Pygmies have the African component at highest frequencies. I'm a little surprised to see Khoisan as a bit non-African because they tend to be outliers against all others, hence they should be most African here. But I guess this is because of recent Bantu admixture.

So the way to read it is that the two populations that are geographically most separated - Amerindians and Sub-Saharan Africans - also have the highest concentration of the two major components. This could be due to a) two sources for modern human genetic diversity (an African and an Amerindian) or b) extinction of even more basal components shared between proto-Amerindians and proto-Sub-Saharan Africans.

The notion of a basic split between Africans and the rest is misguided. The true division is between a subset of non-Africans and a subset of Sub-Saharan Africans, with lots of gene flow in-between.
ReplyDelete
Replies
German DziebelJuly 11, 2012 at 3:57 PM
"Right along with East Asians like the Yi and Tujia from China."

Most runs I've seen show Amerindians as "purer" in their non-African component then than the adjacent East Asians. Also see below for a similar swap between Sotho-Tswana and !Kung in your runs. But in any case, the pattern is the same: the non-African component grows in the easterly direction, just like the African component grows in the westerly direction.

Sotho-Tswana are odd in having higher % of the African component than unadmixed Namibian San and the !Kung. They are definitely not the earliest population in Africa.

"American Indians have the least amount of Genetic diversity of all populations on a global scale"

So? The most genetically diverse continent since 1492 is the New World. Genetic diversity will grow as populations come together and mix up. Lower genetic diversity of Amerindians comes from them being locked in a massive refugium for most of their history and having low population size, higher inbreeding rate and higher drift rate preventing the new variants to get fixed, whereas the progressive increase in diversity in the direction of Africa captures the direction of expansions and gene flow of which Amerindians were not part.

"The non-African and the African components are linked with each other, they are not independent entities, gene-flow is only a secondary link, the primary link being OOA."

I obviously don't believe in multiregionalism. But if the components are conjoined, then out-of-America can be the explanation, just as easily as out of Africa, because the geographic distribution of the two components is nearly perfectly symmetrical. If the non-African (more exactly, "Amerindian") component is original, then, as the African component evolved from it by mutation, the latter began expanding, while the former began contracting. Amerindians stayed behind and preserved it better than anyone else. Africans, on the contrary, mutated away the most.
ReplyDelete
Replies
Andrew Oh-WillekeJuly 13, 2012 at 10:37 PM
I wonder what the margin of error for any given data point is (presumably wider error bars for smaller samples all other things being equal, but smaller for populations that are very internally homogeneous all other things being equal).

One rough justice way to do error bars would be to show each sample population as a range with the most African and least African individual in the sample (or perhaps the 95th percentile or second most African if the 95th percentile would be the extreme individual, and 5th percentile or second least African individual if the 5th percentile would be the extreme individual, in order to limit outsized impact from outliers). Transitional populations with some historic era admixture, like Ethiopians and Egyptians, ought to have wider admixture percentage ranges than relatively isolated ones.

I suspect that these kind of population range bars would make what appear to be out of order entries look instead like entries whose order is indeterminate within the level of precision that the data can support.

It would also be interesting to look at cline gradient as a function of distance from e.g. South Sudan over hypothetical land and sea migration routes, for North and East African and Out of African populations (South Sudan being the closest to the source of the other regions that is close to 100% African). In the Eurasian direction, for example, the cline from South Sudan to Yemen and Syria is very steep relative to modest geographic distance, while it would level off afterwards; the geographic distance from South Sudan to EtT-P is smaller still and still have a very large percentage difference. One could then do a chart with a Y-axis being % African and the X-axis being hypothetical migration distance to illustrate the chart in a way that clearly shows gradient.

This cline looks gradual in your chart in part because the populations in the transitional areas are so finely grained. Another cruder and less labor intensive way to show gradient more clearly would be to lump multiple geographically close samples together so that each population would have a comparable geographic range.
ReplyDelete
Replies
LembaJuly 27, 2012 at 11:28 PM
Etyopsis, the Khoisan's small OOA might be some of the African component that didnt "change" in OOA. I remember that mesolithic spanish sample recently that turned out to have something like 7-8% Khoisan. At first without Khoisan it was 10% East African, and then this turned into Khoisan when Khoisan where included. So IF Pre-neolithic europeans have some of the khoisan component , out of African individuals probably carried some recent African admixture as well, perhaps thee spread of E ydna and the L mtdna's into Europe.
ReplyDelete
Replies
LibbySeptember 29, 2014 at 1:53 AM
I am not a scientist so could you explain this in simple terms please?
ReplyDelete
Replies
Brandon PilcherDecember 5, 2020 at 3:41 PM
What happened to the graph images you had on this post?
ReplyDelete
Replies