A supervised ADMIXTURE run, assumes
that certain populations within a given dataset are 100% of a
certain ancestry, so for instance, given one wants to run ADMIXTURE at
K=10 in supervised mode, then 10 different populations that are
assumed to come from the 10 putative ancestral clusters that the software will infer, or rather will be forced to infer, must be manually selected.
I wanted to explore this type of a run
on a global basis and purposefully select populations that not only may form their own
clusters in an unsupervised run, but are also thought to be within
the 'trunk', bifurcation 'nodes' and end 'branches' of the ancestral 'tree' of all people.
The basis of this run is the global
dataset than can be downloaded in PLINK format from here. The
dataset, a superset of the African dataset that I have been thus far utilizing,
contains 3,970 individuals from around the world typed at 27,022 genome-wide SNPs.
A 3 dimensional, as well as a dim1 vs
dim2, MDS plot labelled according to the median coordinates of the population
groups for this dataset can be seen below:
The general structure of a globally
spread PCA/MDS plot is well known and understood, the first principal
component, describing the highest variation of all the components,
separates Africans from non-Africans, while the second principal
component separates West Asians/Europeans from East Asians, Oceanians
and Native Americans. The 3rd principal component can be
however shaky, in the plot above it separates Native Americans from
the rest, however other sources have shown that the 3rd
principal component in a global PCA separates divergent hunter
gatherers (like the Hadza, Sandawe, San and Pygmies) from every body
else, perhaps a 3-D PCA generated from full genome scans will put this
to rest once and for all.

