Saturday, July 7, 2012

The World At K=2


The most basic Autosomal genetic division of the world is between Africans and Out of Africans (OOA), this is not only seen on global PCA or MDS maps , where the first PC separates Africans from non Africans, but can also be observed with model based statistical (Bayesian) Analysis as well, where the first model iteration, i.e. K=2 distinguishes Africans from non-Africans.
Here, I present (for reference) the full ADMIXTURE, K=2 results for a global dataset of 2,967 individuals from around the world, sampled for 16,595 SNPs with a total genotyping rate of 99.6%.

The results are arranged from the highest median African % to the lowest.


12 comments:

  1. Most interesting, thanks.

    Not sure how to read it but the cline is very clear.

    However I'd doubt of considering the orange cluster the "OoA" one because if you do the same at K=3, when the West Eurasian component will be defined (K=2 depends a lot of sample sizes it seems), the matter should be more clear.

    Why is this? I suspect that the Eurasian subdivision was so quick after the OoA and West Eurasians had some minor African admixture complicating things, that West Eurasians sometimes, depending on sample sizes, end up clustering with Africans (or rather Africans with WEA, because it's more common to undersample Africans in fact) and not East Asians.

    Another issue is that the huge African variability becomes blurred resulting in only a few hunter-gatherer populations showing up as 100% blue, what contrasts with the many East Asian ones showing up as 100% orange (because East Asians are much less diverse).

    So I advise much caution.

    ReplyDelete
    Replies
    1. However I'd doubt of considering the orange cluster the "OoA" one because if you do the same at K=3,”
      Even at K3 the OoA component would only just resolve itself into a West and East Eurasian component, any subsequent clusters formed outside of Africa would technically be subsets of the OoA components, in addition, the West Eurasian component formed at K3 would be intermediate between the African and East Asian components in terms of Fst, a majority of both the African and OoA components at K=2 in the West Asians would thus be combined and disguised as a 'West Eurasian” component that would find its peaks in the Basques and Sardinians.

      “(K=2 depends a lot of sample sizes it seems)”
      Well, I haven't experimented but as far as I can tell this dataset includes the most comprehensive global populations, in terms of macro-regional distributions it has; 972 Continental Africans, 705 West Asians (Including Europe, Middle East, Caucasus and Central Asia), 606 East Asians (Including South East Asia), 360 South Asians, 144 Siberians ,134 from the Americas (both North and South) and 46 Oceanians.

      Delete
    2. "but the cline is very clear. "

      Yes indeed the cline is clear, but there are also a couple of 'breaks', one such noticeable break in Africa is between the EtT-P samples (Ethiopian Tigrayans from Pagani) which measure 62% African (or 38% OoA) and the next entry Morrocans, which measure 52% African (or 48% OoA), this 10% break seems a little bigger than other clinal differentials between populations, however, I suspect that the inclusion of Northern Sudanese and Southern Egyptian populations in the furture would make the cline a lot smoother.

      Delete
  2. I think it's pretty clear from K=2 that American Indians have the non-African component at highest frequencies. Just like Pygmies have the African component at highest frequencies. I'm a little surprised to see Khoisan as a bit non-African because they tend to be outliers against all others, hence they should be most African here. But I guess this is because of recent Bantu admixture.

    So the way to read it is that the two populations that are geographically most separated - Amerindians and Sub-Saharan Africans - also have the highest concentration of the two major components. This could be due to a) two sources for modern human genetic diversity (an African and an Amerindian) or b) extinction of even more basal components shared between proto-Amerindians and proto-Sub-Saharan Africans.

    The notion of a basic split between Africans and the rest is misguided. The true division is between a subset of non-Africans and a subset of Sub-Saharan Africans, with lots of gene flow in-between.

    ReplyDelete
    Replies
    1. “I think it's pretty clear from K=2 that American Indians have the non-African component at highest frequencies”

      Right along with East Asians like the Yi and Tujia from China.

      “I'm a little surprised to see Khoisan as a bit non-African because they tend to be outliers against all others, hence they should be most African here. But I guess this is because of recent Bantu admixture.”

      There is no sign of the San from Namibia having any bantu admixture in any of my intra-African runs, they make a completely independent cluster, on the other hand, the Sotho/Tswana are a bantu people with some San admixture but with plenty of Bantu/Niger-Kordofanian components, yet they have a marginally higher African % than the San from Namibia in this K2 global run, the reason is likely because some of the alleles present in those descended from Ancestral African populations and remained in Africa, are being confused with some of the OOA alleles by the ADMIXTURE software.

      “a) two sources for modern human genetic diversity (an African and an Amerindian)”

      “b) extinction of even more basal components shared between proto-Amerindians and proto-Sub-Saharan Africans. “

      Both scenarios are quite unlikely considering that with respect to Heterozygousity and Allele Size Variance, American Indians have the least amount of Genetic diversity of all populations on a global scale, while the reverse holds true for Sub-Saharans. Which brings me to another interesting point, comparing the decreasing order of the K2 African component from this post with that of global Genetic diversity decrease from Africa and observe the striking congruency, a coincidence? I think not.

      “The true division is between a subset of non-Africans and a subset of Sub-Saharan Africans, with lots of gene flow in-between.”

      The non-African and the African components are linked with each other, they are not independent entities, gene-flow is only a secondary link, the primary link being OOA.

      Delete
  3. "Right along with East Asians like the Yi and Tujia from China."

    Most runs I've seen show Amerindians as "purer" in their non-African component then than the adjacent East Asians. Also see below for a similar swap between Sotho-Tswana and !Kung in your runs. But in any case, the pattern is the same: the non-African component grows in the easterly direction, just like the African component grows in the westerly direction.

    Sotho-Tswana are odd in having higher % of the African component than unadmixed Namibian San and the !Kung. They are definitely not the earliest population in Africa.

    "American Indians have the least amount of Genetic diversity of all populations on a global scale"

    So? The most genetically diverse continent since 1492 is the New World. Genetic diversity will grow as populations come together and mix up. Lower genetic diversity of Amerindians comes from them being locked in a massive refugium for most of their history and having low population size, higher inbreeding rate and higher drift rate preventing the new variants to get fixed, whereas the progressive increase in diversity in the direction of Africa captures the direction of expansions and gene flow of which Amerindians were not part.

    "The non-African and the African components are linked with each other, they are not independent entities, gene-flow is only a secondary link, the primary link being OOA."

    I obviously don't believe in multiregionalism. But if the components are conjoined, then out-of-America can be the explanation, just as easily as out of Africa, because the geographic distribution of the two components is nearly perfectly symmetrical. If the non-African (more exactly, "Amerindian") component is original, then, as the African component evolved from it by mutation, the latter began expanding, while the former began contracting. Amerindians stayed behind and preserved it better than anyone else. Africans, on the contrary, mutated away the most.

    ReplyDelete
  4. I wonder what the margin of error for any given data point is (presumably wider error bars for smaller samples all other things being equal, but smaller for populations that are very internally homogeneous all other things being equal).

    One rough justice way to do error bars would be to show each sample population as a range with the most African and least African individual in the sample (or perhaps the 95th percentile or second most African if the 95th percentile would be the extreme individual, and 5th percentile or second least African individual if the 5th percentile would be the extreme individual, in order to limit outsized impact from outliers). Transitional populations with some historic era admixture, like Ethiopians and Egyptians, ought to have wider admixture percentage ranges than relatively isolated ones.

    I suspect that these kind of population range bars would make what appear to be out of order entries look instead like entries whose order is indeterminate within the level of precision that the data can support.

    It would also be interesting to look at cline gradient as a function of distance from e.g. South Sudan over hypothetical land and sea migration routes, for North and East African and Out of African populations (South Sudan being the closest to the source of the other regions that is close to 100% African). In the Eurasian direction, for example, the cline from South Sudan to Yemen and Syria is very steep relative to modest geographic distance, while it would level off afterwards; the geographic distance from South Sudan to EtT-P is smaller still and still have a very large percentage difference. One could then do a chart with a Y-axis being % African and the X-axis being hypothetical migration distance to illustrate the chart in a way that clearly shows gradient.

    This cline looks gradual in your chart in part because the populations in the transitional areas are so finely grained. Another cruder and less labor intensive way to show gradient more clearly would be to lump multiple geographically close samples together so that each population would have a comparable geographic range.

    ReplyDelete
    Replies
    1. “I wonder what the margin of error for any given data point is”

      The Standard Deviations for each population can be found here, the dataset overall had a mean SD of 1.41% across all populations, with a maximum of 11% with the Southern Morrocans, followed by the Yukaghirs at 10%, the Paniya, Ethiopians (combined samples of EtA, EtO and EtT from Behar '10), Yemenis, Moroccans and African Americans, all had SD values of 5-7%, while the remaining populations all had SD less than 5%.

      “the geographic distance from South Sudan to EtT-P is smaller still and still have a very large percentage difference.”

      There are plenty of East Africans, both Nilo-Saharan, Afroasiatic and click speakers alike, sampled between the Southern Sudanese and EtT-P. The Hadza for instance stand at 9% OoA and so forth....

      Delete
  5. Etyopsis, the Khoisan's small OOA might be some of the African component that didnt "change" in OOA. I remember that mesolithic spanish sample recently that turned out to have something like 7-8% Khoisan. At first without Khoisan it was 10% East African, and then this turned into Khoisan when Khoisan where included. So IF Pre-neolithic europeans have some of the khoisan component , out of African individuals probably carried some recent African admixture as well, perhaps thee spread of E ydna and the L mtdna's into Europe.

    ReplyDelete
    Replies
    1. The way I see it is that the OOA component is likely part of a basic genetic signature of a small group of Africans that lived somewhere in Eastern Africa, they leave Africa and the component starts getting fixed in the populations outside of Africa, the farther they are the more it is fixed and the higher proportion of the genome it represents, the ones closest to Africa however maintained both the African and OOA signatures by interbreeding with fresh waves of Africans leaving as well as those that had left back migrating into their original homeland.

      Delete
  6. I am not a scientist so could you explain this in simple terms please?

    ReplyDelete
  7. What happened to the graph images you had on this post?

    ReplyDelete