Showing posts with label MDS. Show all posts
Showing posts with label MDS. Show all posts

Wednesday, March 21, 2012

A Supervised Global ADMIXTURE Run


A supervised ADMIXTURE run, assumes that certain populations within a given dataset are 100% of a certain ancestry, so for instance, given one wants to run ADMIXTURE at K=10 in supervised mode, then 10 different populations that are assumed to come from the 10 putative ancestral clusters that the software will infer, or rather will be forced to infer, must be manually selected.

I wanted to explore this type of a run on a global basis and purposefully select populations that not only may form their own clusters in an unsupervised run, but are also thought to be within the 'trunk', bifurcation 'nodes' and end 'branches' of the ancestral 'tree' of all people.
  
The basis of this run is the global dataset than can be downloaded in PLINK format from here. The dataset, a superset of the African dataset that I have been thus far utilizing, contains 3,970 individuals from around the world typed at 27,022 genome-wide SNPs.
A 3 dimensional, as well as a dim1 vs dim2, MDS plot labelled according to the median coordinates of the population groups for this dataset can be seen below:



The general structure of a globally spread PCA/MDS plot is well known and understood, the first principal component, describing the highest variation of all the components, separates Africans from non-Africans, while the second principal component separates West Asians/Europeans from East Asians, Oceanians and Native Americans. The 3rd principal component can be however shaky, in the plot above it separates Native Americans from the rest, however other sources have shown that the 3rd principal component in a global PCA separates divergent hunter gatherers (like the Hadza, Sandawe, San and Pygmies) from every body else, perhaps a 3-D PCA generated from full genome scans will put this to rest once and for all.