Monday, April 21, 2014

Genomic and cranial phenotype data support multiple modern human dispersals from Africa and a southern route into Asia


Current consensus indicates that modern humans originated from an ancestral African population between ∼100–200 ka. The ensuing dispersal pattern is controversial, yet has important implications for the demographic history and genetic/phenotypic structure of extant human populations. We test for the first time to our knowledge the spatiotemporal dimensions of competing out-of-Africa dispersal models, analyzing in parallel genomic and craniometric data. Our results support an initial dispersal into Asia by a southern route beginning as early as ∼130 ka and a later dispersal into northern Eurasia by ∼50 ka. Our findings indicate that African Pleistocene population structure may account for observed plesiomorphic genetic/phenotypic patterns in extant Australians and Melanesians. They point to an earlier out-of-Africa dispersal than previously hypothesized. 


Despite broad consensus on Africa as the main place of origin for anatomically modern humans, their dispersal pattern out of the continent continues to be intensely debated. In extant human populations, the observation of decreasing genetic and phenotypic diversity at increasing distances from sub-Saharan Africa has been interpreted as evidence for a single dispersal, accompanied by a series of founder effects. In such a scenario, modern human genetic and phenotypic variation was primarily generated through successive population bottlenecks and drift during a rapid worldwide expansion out of Africa in the Late Pleistocene. However, recent genetic studies, as well as accumulating archaeological and paleoanthropological evidence, challenge this parsimonious model. They suggest instead a “southern route” dispersal into Asia as early as the late Middle Pleistocene, followed by a separate dispersal into northern Eurasia. Here we test these competing out-of-Africa scenarios by modeling hypothetical geographical migration routes and assessing their correlation with neutral population differentiation, as measured by genetic polymorphisms and cranial shape variables of modern human populations from Africa and Asia. We show that both lines of evidence support a multiple-dispersals model in which Australo-Melanesian populations are relatively isolated descendants of an early dispersal, whereas other Asian populations are descended from, or highly admixed with, members of a subsequent migration event. 

Link (Closed Access) 

Friday, April 4, 2014

Median Joining Networks

This post will be dedicated to YSTR median joining networks I will be creating using the Fluxus Network Software©

The  Y TMRCA calculator now also has the capability to create the input file necessary to create median joining networks using the Fluxus Network Software, see here for more details.

This blog-post will be updated routinely with network diagrams as I make more of them, since I do not have access to the Network Publisher  add-on, it takes a considerable amount of time to properly format the diagrams.

I will start with Ethiopian E-M34 data from the Plaster thesis, see also the following post for more detail on Ethiopian E-M34: YDNA E-M123; A closer look 

Monday, March 24, 2014

Modeling 3D Facial Shape from DNA

While still at its infancy, this technology is quite fascinating


Human facial diversity is substantial, complex, and largely scientifically unexplained. We used spatially dense quasi-landmarks to measure face shape in population samples with mixed West African and European ancestry from three locations (United States, Brazil, and Cape Verde). Using bootstrapped response-based imputation modeling (BRIM), we uncover the relationships between facial variation and the effects of sex, genomic ancestry, and a subset of craniofacial candidate genes. The facial effects of these variables are summarized as response-based imputed predictor (RIP) variables, which are validated using self-reported sex, genomic ancestry, and observer-based facial ratings (femininity and proportional ancestry) and judgments (sex and population group). By jointly modeling sex, genomic ancestry, and genotype, the independent effects of particular alleles on facial features can be uncovered. Results on a set of 20 genes showing significant effects on facial features provide support for this approach as a novel means to identify genes affecting normal-range facial features and for approximating the appearance of a face from genetic markers.

Link (Open Access) 

......Since both categorical and continuous variables can be modeled using BRIM, this approach might be used to test for relationships between facial features and other factors, e.g., age, adiposity, and temperament. The methods illustrated here also provide for the development of diagnostic tools by modeling validated cases of overt craniofacial dysmorphology. Most directly, our methods provide the means of identifying the genes that affect facial shape and for modeling the effects of these genes to generate a predicted face. Although much more work is needed before we can know how many genes will be required to estimate the shape of a face in some useful way and many more populations need to be studied before we can know how generalizable the results are, these results provide both the impetus and analytical framework for these studies.....

Some interesting figures:

Figure 1: Workflow for 3D face scan processing.
A) original surface, B) trimmed to exclude non-face parts, C) reflected to make mirror image, D) anthropometric mask of quasi-landmarks, E) remapped, F) reflected remapped, G) symmetrized, H) reconstructed.
Figure 3: Transformations and heat maps showing how face shape is affected by (A) RIP-A and (B) RIP-S.
The top row of each panel shows the shape transformations three standard deviations below and above the mean of the RIPs in this sample. The second row shows the R2 (proportion of the total variation in each quasi-landmark) and the three primary facial shape change parameters: area ratio, curvature difference, and normal displacement. The bottom row shows in yellow the regions of the face that are statistically significantly different (p<0.001) between the two transformations. The max R2 values for RIP-A and RIP-S are 40.83% and 38.21%, respectively. 
Figure 4: Relationships between the ancestry and sex RIP variables and their initial predictor variables.
(A) RIP-A with genomic ancestry; genomic ancestry is calculated using the core panel of 68 AIMs and RIP-A is calculated using this ancestry estimate on the set of three populations combined (N = 592). Populations are indicated as shown in the legend with United States participants shown with black circles, Brazilians with red circles, and Cape Verdeans with blue circles. (B) Histograms of RIP-S by self-reported sex.
Figure 6: Transformations and heat maps showing how face shape is affected by three particular RIP-G variables.
The initial predictor variables are SNPs in the genes (A) SLC35D1 (B) FGFR1, and (C) LRP6. The top row of each panel shows the shape transformations near the extreme values of the particular RIP-G shown. The second row shows the R2 (proportion of the facial total variation), the three primary facial shape change parameters: area ratio, curvature difference, and normal displacement. The max R2 values for A, B, and C are 11.68%, 15.16% and 10.10%, respectively.

Sunday, March 23, 2014

The $1,000 genome

.................The price of sequencing an average human genome has plummeted from about US$10 million to a few thousand dollars in just six years (see ‘Falling fast’). That does not just outpace Moore's law — it makes the once-powerful predictor of unbridled progress look downright sedate. And just as the easy availability of personal computers changed the world, the breakneck pace of genome-technology development has revolutionized bioscience research. It is also set to cause seismic shifts in medicine...........

Read More Here:

Wednesday, February 26, 2014

Was skin cancer a selective force for black pigmentation in early hominin evolution?

Some excerpts from a very interesting read, I suggest reading the whole article here:

Dark or black skin lowers the risk of ultraviolet radiation (UVR)-induced skin cancer by several orders of magnitude and, while this might be considered an incidental benefit, here I make a case for lethal skin cancer—in reproductive, young, early humans, as a potent selective force underlying the emergence of black skin as the ancestral pigmentation state.

Were it not for the efficacy of DNA repair of UV-induced DNA damage, those with white skin would all have cancer, and at a very young age, as evidenced by the impact of the inherited disorder of nucleotide excision DNA repair, xeroderma pigmentosum (XP) [25,26]. Black- or dark-skinned ethnic groups are substantially less at risk but when they do have a diagnosis of skin cancer, it is often on soles and palms—less pigmented regions of the body [27,28].

In black-skinned individuals, melanocytes synthesize brown/black eumelanin which is then packaged into peri-nuclear distributed, ellipsoid melanosomes of keratinocytes (figure 1). This appears to be a near optimal arrangement for UV filtration and DNA protection. In white skin, melanocytes synthesize a higher proportion of yellow and/or red pheomelanin and this is then assembled into clustered small, circular melanosomes in keratinocytes. The compound effect is minimal UV filtration.

Whatever the evolutionary logic, the acquisition of pale skin has become a liability. But only so because pale-skinned Europeans have been subject to either voluntary or enforced migration to much sunnier climes (e.g. Queensland, Australia, and other subtropical zones) and, more recently, have availed themselves of youthful opportunities for intermittent high level sun exposure via inexpensive air travel and recreational holidays in the sun. In this context, skin cancer arises as the consequence of a mismatch between the ancestral environmental conditions that shaped our genetics and skin properties and our current behavioural and social activities [3]. This narrative is reasonably well established. What I address here is another and, in a sense, reciprocal evolutionary aspect of skin coloration and cancer risk.

Early hominin evolution in East Africa at some 2–3 Ma was associated with a dramatic loss of the body hair development that is retained by our primate cousins [68,69]. Hair growth was retained on the head—the most UVR-exposed part of the body of a bipedal hominin. Some exotic explanations have been entertained for this dramatic phenotypic shift, including avoidance of fur parasites or of catching fire, a response to wearing clothes or an adaptation to an aquatic way of life [68–72]. But the most likely major adaptive advantage would have been for thermoregulation or facilitation of sweating and heat loss for physically active, hunter–gatherers in the savannah [69,73,74]. But what colour was the exposed skin of the first hairless hominins? Not black it would seem. The skin of our nearest primate relative, the chimpanzee, is, under the fur, essentially pale or white with melanocytes restricted to hair follicles [67]. The exposed and relatively hairless face and hands are also white in infant chimpanzees of three Pan subspecies (but black in Pan paniscus) and they become facultatively pigmented with age [75]. It has therefore been considered very likely, albeit not unambiguously so [76], that the first African hominins to discard hirsutism were also white- or pale-skinned [7,43,50].

There are no population-based databases that provide for accurate age incidence rates of skin cancer in African albinos. However, multiple clinical reports testify to the fact that the prevalence of skin cancer in African albinos, though variable according to geographical region, is exceptionally high in low-latitude (5–10°) regions with high year-round UVB exposures, including Tanzania [97,109–111], Cameroon [112] and Nigeria [93,113,114]. In South Africa, skin cancer rates in albinos vary with latitude and altitude, being relatively high in Soweto and the Transvaal and lower in the Transkei [107,115]. The risk of developing skin cancer in Soweto albinos was estimated to be some 1000 times that of pigmented blacks [116]. Erythema and burns occur in infant albinos and focal skin lesions develop as early as 5 years of age [97]. By the age of 20 years, most albino individuals in low-latitude regions have multiple actinic keratoses (figure 3) [97], the precursor lesions for SCC [117,118]. Many of these regress spontaneously but most, if not all albinos, have overt skin cancer in their twenties or thirties [97,115,119,120], with occasional presentation even in childhood [97]. 

8. Concluding remarks
Extrapolation from the current risk of skin cancer in OCA2 albinos to that of early hominins in equatorial Africa is clearly speculative but if early humans were indeed pale-skinned, they would most probably have similarly suffered substantial affliction during reproductively active years from non-melanoma skin cancers. That skin cancer in African albinos might be germane to considerations of the adaptive significance of dark skin has been noted before [7,11,138,139], but never explored.

The age-related incidence and mortality from skin cancer, both historically and in contemporary albinos, have been modulated by many factors, including lifestyle, occupation and varying degrees of awareness, preventive measures and medical intervention [91]. In these cultural respects, the lethal impact of skin cancer would have been more severe in naked, pale-skinned and outdoor living hominins, dwelling in a habitat with the highest levels of year-round UVB radiation—in open and arid equatorial savannah. It is difficult to imagine a more potent prescription for cancer: maximum, sustained, whole-body carcinogenic exposure (UVB) coupled with minimal attenuation capacity (via melanin). Young hunter–gatherer males might have suffered the greatest UV exposure and risk of cancer. Death would have ensued at a young age from either metastases or localized invasion, ulceration, bleeding and infection. The detrimental impact on reproductive fitness would then have been severe, providing potent pressure for both the selective sweep of the highly stable African MC1R variant, promoting eumelanin synthesis and black skin and its subsequent stable maintenance for more than a million years. This critical gene clearly did diversify in sequence and function in the descendents of most of those migrants that left Africa to populate the rest of the world. In those, the selective pressures via UVR were both relaxed and different.

Tuesday, February 25, 2014

mtDNA from Southern Africa

Reference mtDNA from Southern Africa from the pre-print "Migration and interaction in a contact zone: mtDNA variation among Bantu-speakers in southern Africa" (Thanks to Maju for the referral)

Friday, February 21, 2014

YDNA E-M123; A closer look

E-M123 (as well as E-M34) was first discovered by Underhill(2000) and is found with a low to medium frequency distribution in East Africa and the Middle East, while it has a low frequency distribution in North Africa and Europe.

Figure 1 - Current and previous E-M215 phylogenetic structure 

Figure 1 shows a comparison of the basic phylogeny of E-M215/M35 as was known before 2011 (a) and after (b), with a 'who and when' key for the Discovery of the UEPs. Notice the impact the rearrangement has on the phylogenetic placement of E-M123, specifically the fact that E-M123 is shown to have a more recent common ancestor with the East and Southern African variants of E-M35, i.e. E-V42 and E-M293, before it does with any of the other variants of E-M35.

Previous publications:

While it is unfortunate that all of the research that has previously been published on E-M123 was done under the consideration of the older (and rather out of date) configuration of the basic structure of E-M35, it is still worth while to look at articles that have tried to untangle the origins and history of this lineage, of these, 3 come to mind:

Friday, February 14, 2014

Comprehensive Ethiopian YDNA TMRCA Estimates

Find below a comprehensive list for all central TMRCA estimates calculated from the Plaster thesis for 6 UEPs (look at this post under Interactive Chart of Figure 3.2 for the frequencies of the UEPs). P*(x R1a) & Y*(x BT,A3b2)  are not included due to their minimal frequency and very sporadic distribution. 

There were a total of 5,756 haplotypes reported with the paper for the markers DYS19, DYS388, DYS390, DYS391, DYS392 and DYS393.  30 of those haplotypes belonged to P*(x R1a) & Y*(x BT,A3b2), leaving a total of 5,726 haplotypes. These remaining haplotypes, were then categorized with the criteria of Cultural ID + Generic Language Group* + UEP, any group of haplotypes that conformed to this criteria with N >1 and with a coalescent not equal to 0 (meaning non-identical haplotypes) were processed for their TMRCA and reported, accounting for 5,668 or 98% of the total haplotypes reported for the paper.

The tables are ordered according to the frequencies of the tested UEPs in Ethiopia, i.e. E*(x E1b1a), 3985 Haplotypes  > J,  689 Haplotypes  > A3b2, 601 Haplotypes  > K*(xL,N1c,O2b,P) , 154 Haplotypes > BT*(xDE,JT), 193 Haplotypes  and E1b1a7, 46 Haplotypes .

Note that both the mean TMRCA's for Zhivotovsky (Z-TMRCA) and the pedigree rates (P-TMRCA), some times also known as germline rates, are in units of generations, the suitable length of a generation for the Z-TMRCA is 25 years, while for the P-TMRCA it may range from 28 to 33 years.

If detail of the TMRCA analysis for any of the populations listed below maybe required, go to the table here, and upload the necessary file into the Y TMRCA calculator and filter for the specific population in question.

Tuesday, February 11, 2014

Ethiopian YDNA J STR Analysis - An addendum

In the past, I had carried out a TMRCA (STR) analysis of YDNA haplogroup J haplotypes from Ethiopia using the primary dataset from the Plaster thesis that was discussed here. While that particular dataset had a large number of haplotypes, it also had a low number of Markers (6). However there was supplementary data that had Y-STR Haplotypes from haplogroup J supplied with the paper. While it only had data for a select few of the populations found in the main paper, it however had better resolution typing at 14 markers. Below are the TMRCA results for those haplotypes. The Dataset can be found in this table in .csv format under "Ethiopian_JM267.csv".
In total, 54 haplotypes were found in the supplementary dataset, nevertheless the total number of haplotypes among the population groups sum up to 53 above, the reason is because one haplotype that belonged to the Anuak dataset was not included.

The results are quite consistent with the results I got from the dataset with less resolution, even if the sample sizes are quite small. For instance, although the Afar had the J Haplogroup in excess of 25%, their haplotypes show the least amount of diversity, conversely the high diversity of Haplogroup J in the other populations is still maintained. 

While the Zhivotovsky TMRCA (Z-TMRCA) for all the 691 YDNA J haplotypes found in Ethiopia in the lower resolution dataset was previously calculated to 595 generations, the Z-TMRCA for the higher resolution dataset for all 54 haplotypes, as seen above, was calculated to 705 generations, if only the markers that were used in the lower resolution data set were used to compute the Z-TMRCA in these 54 haplotypes we would get a Z-TMRCA of 631 Generations. Furthermore, if we intersected the 14 markers from this dataset with the recommended Zhivotovsky markers, the resulting markers of '19', '393', '392', '391', '390', '439', '388', '389-1' and '389-2' , would yield a Z-TMRCA of 920 generations, implicating  an introduction of YDNA J-M267 in Ethiopia well into the Upper Paleolithic.

Update: With respect to the low resolution haplotypes from the plaster thesis; I have added 5,726 YDNA str haplotypes  in *.csv format compatible with the calculator and tabulated according to the UEPs tested, in the Table at this link below as well:

Monday, January 27, 2014

Y TMRCA Calculator as a Web App

The Y DNA (STR) TMRCA calculator can now be accessed as a web application with full functionality here:

It is also embedded in this blog in a new page (above)

UPDATE (02/11/2014)

Another series of updates for the calculator:

  • User now able to utilize the previously idle first column in the csv file to group haplotypes together and thus compute the TMRCA for a specified group (see example below)
  • The application now also accepts Locus names in NIST format as well.
  • It also now automatically deletes any haplotype with a non-integer value given for any locus in the *.csv file. (instead of producing an error for that scenario)

Monday, December 9, 2013

More East African mtDNA Charts

Below are more East African mtDNA bar graphs from the Hirbo Thesis, the complementary YDNA charts can be seen in this post, along with the Boattini paper featured here, this gives us a more complete picture of East African mtDNA with a reasonable amount of detail.

Google Visualization API has been having problems for the past couple of months, so the tool tips as well as other functionalities of Google charts may not work, this post will be updated if they fix some of these issues.

With respect to some of the data points, the populations labeled with a * had their total number of samples adjusted in order for the percentages shown in Table 3.4.1 to make sense, that is, Orma has been adjusted from 20 to 21, Marakwet from 22 to 23, Pokot from 39 to 38, San from 11 to 12 and Bamoun from 18 to 20.

Friday, November 15, 2013

Bitter Taste Sensitivity in Africa

Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa 

Bitter taste perception influences human nutrition and health, and the genetic variation underlying this trait may play a role in disease susceptibility. To better understand the genetic architecture and patterns of phenotypic variability of bitter taste perception, we sequenced a 996 bp region, encompassing the coding exon of TAS2R16, a bitter taste receptor gene, in 595 individuals from 74 African populations, and in 94 non-Africans from 11 populations. We also performed genotype-phenotype association analyses of threshold levels of sensitivity to salicin, a bitter anti-inflammatory compound, in 296 individuals from Central and East Africa. In addition, we characterized TAS2R16 mutants in vitro to investigate the effects of polymorphic loci identified at this locus on receptor function. Here, we report striking signatures of positive selection, including significant Fay and Wu's H statistics predominantly in East Africa, indicating strong local adaptation, and greater genetic structure among African populations than expected under neutrality. Furthermore, we observed a “star-like” phylogeny for haplotypes with the derived allele at polymorphic site 516 associated with increased bitter taste perception that is consistent with a model of selection for “high-sensitivity” variation. In contrast, haplotypes carrying the “low-sensitivity” ancestral allele at site 516 showed evidence of strong purifying selection. We also demonstrated, for the first time, the functional effect of nonsynonymous variation at site 516 on salicin phenotypic variance in vivo in diverse Africans, and showed that most other nonsynonymous substitutions have weak or no effect on cell surface expression in vitro, suggesting that one main polymorphism at TAS2R16 influences salicin recognition. Additionally, we detected geographic differences in levels of bitter taste perception in Africa not previously reported, and infer an East African origin for high salicin sensitivity in human populations.

Closed Access  

From the Press:

“Because Africa is the site of origin of all modern humans,” said study author Sarah Tishkoff, a professor in the University of Pennsylvania’s Department of Genetics, “Africans are going to have a large amount of diversity and non-Africans are going to have a subset of that diversity. In Africa, you get an opportunity to observe how these genetic variants are influencing phenotypes that you wouldn’t have if you were only studying non-Africans.”

“The taste testing shows that the mutations in TAS2R16 had functional significance for the bitter taste perception system,” said study author Paul Breslin, an experimental psychologist from Rutgers University. “In this case, the mutation caused a gain of taste function.”

“The types of populations we’re studying are diverse and they have diverse diets,” Tishkoff said, “suggesting that there is likely something else going on here. By getting a handle on how much variation is in these populations, where it is located and what are the particular signatures of selection, it might start giving us clues as to what we should be looking at in terms of the biomedical or physiological significance of these genes.”  

Tuesday, October 22, 2013

New paper sheds light on the F-series YDNA SNPs

The F-series YDNA SNPs appeared at the end of last year with results from Geno 2.0, now an electronic pre-print at sheds some light on the discovery of these SNPs.

The paper, entitled :  Y Chromosomes of 40% Chinese Are Descendants of Three Neolithic Super-grandfathers, is freely available for download.

Some interesting (relevant to this blog) quotations from the paper follows (in blue) :

To identify major population expansions related to male lineages, we sequenced 78 East Asian Y chromosomes at 3.9 Mbp of the non-recombining region (NRY), discovered >4,000 new SNPs, and identified many new clades.

Nearly all the Y chromosomes outside Africa are derivative at the SNP M168 and belong to any of its three descendent super-haplogroups – DE, C, and F 9,10,15, strongly supporting the out-of-Africa theory. The time of the anatomically modern human’s exodus from Africa has yielded inconsistent results ranging from 39 kya 16, 44 kya 10, 59 kya 17, 68.5 kya 18 to 57.0 – 74.6 kya 19.

This below explains why the F-series SNPs are for the most part found below CT-M168.

we selected 110 males, encompassing the haplogroups O, C, D, N, and Q which are common in East Eurasians, as well as haplogroups J, G, and R which are common in West Eurasians (see Table S1), and sequenced their non-repetitive segments of NRY using a pooling-and-capturing strategy.

Overall ~4,500 base substitutions were identified in all the samples from the whole Y chromosome, in which >4,300 SNPs that has not been publicly named before 2012 (ISOGG etc.). We designated each of these SNP a name beginning with ‘F’ (for Fudan University) (see Table S2). We obtained ~3.90 Mbp of sequences with appropriate quality (at least 1x coverage on >100 out of 110 samples), and identified ~3,600 SNPs in this region.

Table S2 is not available in the PDF file, the link says that all the tables are in a 'separate ancillary file', but such file is also not available, at least not at the time of the publishing of this post, and may become available when the paper is officially published. With out seeing the actual location on the Y chromosome where these SNPs are found it is hard to say how many of them are redundant SNPs relative to the PF and CTS SNPs, and how many of them are truly 'novel'.

Considering that 3.9 Mbp range constitutes only less than half of 10 Mbp non-repetitive region in Y chromosome 7, the time resolution of east Asian Y chromosome phylogeny is expected to be doubled in the near future.

To overcome the factors for uncertainty of mutation rate, a calibration with series of samples of comparable time scales might be used. For the case of mitochondrion, a recent study, in which several C-14 calibrated ancient complete sequences (4 – 40 kya) were incorporated into the tree, made the absolute dates much more convincing 41, and we expect a parallel calibration for the Y chromosome in the near future.

The authors conclude the paper with this paragraph:

Despite of the mutation rate uncertainty, we evaluate our calculation of absolute divergence time as acceptable. Firstly, our out-of-Africa date (54.1 kya) is still within the range of previous estimations (39 – 74.6 kya). Secondly, the out-of-Africa date is similar to the recent estimation of two great mitochondrial expansions outside Africa – M (49.6 kya) and N (58.9 kya) 42. Thirdly, it is not contradictory to the emergence of earliest modern human fossil out of Africa (e.g. ~ 50 kya in Australia) 43.

In the Supplementary Materials/Additional Discussions section they also mention this:

It remained mysterious that how many times the anatomically modern human migrated out of Africa, since that among the three superhaplogrous C, DE and F, Haplogroup F distributes in whole Eurasia, C in Asia and Austronesia, D exclusively in Asia, while D’s brother clade E distribute mainly in Africa 62, so there are two hypotheses, 1) haplogroups D and CF migrated out of Africa separately; 2) the single common ancestor of CF and DE migrated out of Africa followed by a back-migration of E to Africa. From this study, the short interval between CF/DE and C/F divergences weakens the possibility of multiple independent migrations (CF, D, and DE*) out of Africa, and thus supports the latter hypothesis 63 (Fig. S2 a).

Perhaps the only new material they have from this study that may strengthen the hypothesis of an extra-African origin of haplogroup E is, as they mention, the 'short interval' between the common ancestor of CF and DE  and the C/F divergences, however, this 'short interval' is relative to which branch length? They did not compute the interval between the BT common ancestor and the CFDE divergence, in addition, what length of time would be considered too short to disqualify the possibility of multiple independent migrations, and how would this length of time be evaluated? next, what about the cases of DE* found in Nigeria and Guinea-Bissau that they failed to mention here, that is to say, cases found that are neither D or E but are down stream from the YAP+ insertion, how exactly are they to be explained ? 

Either way, putting all these questions aside, let us assume that their proposal is correct, how then would this be reconciled with the last paragraph in the actual paper, where they associate M and N mtDNA haplogroups, with the out of Africa expansion, this would mean that if E back migrated, it would have done so with lineages downstream from mtDNA haplogroups M and N, however, many areas in Africa where E- dominates (except for East and North Africa) have, if not zero, close to zero amounts of mtDNA haplogroups M and N, wouldn't we expect to see at least some traces of the mtDNA counter part for this supposed ancient back migration in YDNA haplogroup E dominant areas of Africa other than the East and the North ? In an otherwise good and all around informative paper, I think the authors may have jumped the gun with this particular speculation, perhaps that is why they stuck it into the supplementary section of the paper and not the actual paper itself, as a testament to the highly speculative nature to their supposition.

Tuesday, October 8, 2013

TMRCA calculator for Python

I have converted the TMRCA calculator to run from only on Octave to Python as well, see here for the Octave version.
It is specifically made for Python 2.7, and have not had a chance to test it on other versions. No more libraries are required to run the script other than the standard libraries that come with 2.7. Some of the advantages of converting to Python are: less steps to run the program, easier for (future) web app deployment and more user access to Python than Octave.

The Zip file can be dowloaded here:
TMRCA Calculator Instructions - for python 2.7

To check if the TMRCA program is correctly working on your system, first run it with the dataset
provided here before trying different datasets, to do so:

(1) Make sure you have python 2.7 loaded on your system (either Windows or Linux will work) and start running the interpreter.
(2) In the interpreter, change your working directory to the directory where you saved the unzipped folder by using:
(i) import os 
(ii) os.chdir('~PATH/TMRCA/')
-Where ~PATH is the full path where the TMRCA folder is placed on your system.
If you are unsure of your current working directory, type the command: os.getcwd()
(3) import the tmrca module by typing: import tmrca  
(4) Execute Script by typing: tmrca.Analysis('EM35_Example.csv','all')
(5) If this produces results with no errors in the interpreter, then the program is correctly installed and you can proceed to reading and analysing different datasets.

Reading and analysing new Data

After correctly executing the above steps, read and analyse new data by using the following steps:
(1)Examine the example STR data file in the "TMRCA/" folder entitled "EM35_Example.csv"
(2)Any STR data file to be analysed should first be made in the same format as the "EM35_Example.csv" file , specifically:
(a) DYS names in the first row should have the exact same nomenclature (the orders can be different however).
(b) Each row (except the first) should represent one sample.
(c) Each coloumn (except the first) should represent repeats for one marker/DYS#.
(d) The first column should represent sample identifiers, ex. Kit#, sample ID,...
(e) The cell found in the first row and first column should have the Dataset's name, this will be the same name used throughout the analysis.
(f) No cells shall contain null values and avoid having cells that contain characters which have spaces in between them.
(g) The file MUST be a *.csv file with commas used for field delimiters
(3) Place the *.csv file directly in the "TMRCA/" folder (i.e. in your working directory)
(4) Start the interpreter, change the working directory to '~PATH/TMRCA/', as per the instructions above and import tmrca.
(5) If you want to analyse a specific set of markers from your dataset go to step 6, otherwise go to step 7
(6) Go to the file "/TMRCA/Markerlist/49markerlist.txt", and pick the markers you want to use for analysis from there. Save your chosen
markers into a new *.txt file and into the same folder as "/TMRCA/Markerlist/". Take a look at  any of the other marker list text files in
the folder for an example of how a marker list should look. Note that all marker list files need to be *.txt
(7) If you are specifying a set of markers to use for the analysis, for example "8_Chiaronimarkerlist.txt", then run the program
by typing: tmrca.Analysis('EM35_Example.csv','8_Chiaronimarkerlist.txt'),otherwise, just type: tmrca.Analysis('EM35_Example.csv','all').

Thursday, September 5, 2013

An absolute chronology for early Egypt

An absolute chronology for early Egypt using radiocarbon dating and Bayesian statistical modelling


The Egyptian state was formed prior to the existence of verifiable historical records. Conventional dates for its formation are based on the relative ordering of artefacts. This approach is no longer considered sufficient for cogent historical analysis. Here, we produce an absolute chronology for Early Egypt by combining radiocarbon and archaeological evidence within a Bayesian paradigm. Our data cover the full trajectory of Egyptian state formation and indicate that the process occurred more rapidly than previously thought. We provide a timeline for the First Dynasty of Egypt of generational-scale resolution that concurs with prevailing archaeological analysis and produce a chronometric date for the foundation of Egypt that distinguishes between historical estimates.

Open Access: Link 

The sequence of events from the Badarian,
 through the Predynastic (Naqada period) to
 the establishment of the First Dynasty.
The divisions are placed at the medians of each hpd range.

Friday, August 30, 2013

Diversity of Lactase Persistence Alleles in Ethiopia: Signature of a Soft Selective Sweep

The persistent expression of lactase into adulthood in humans is a recent genetic adaptation that allows the consumption of milk from other mammals after weaning. In Europe, a single allele (−13910T, rs4988235) in an upstream region that acts as an enhancer to the expression of the lactase gene LCT is responsible for lactase persistence and appears to have been under strong directional selection in the last 5,000 years, evidenced by the widespread occurrence of this allele on an extended haplotype. In Africa and the Middle East, the situation is more complicated and at least three other alleles (−13907G, rs41525747; −13915G, rs41380347; −14010C,rs145946881) in the same LCT enhancer region can cause continued lactase expression. Here we examine the LCT enhancer sequence in a large lactose-tolerance-tested Ethiopian cohort of more than 350 individuals. We show that a further SNP, −14009T>G (ss 820486563), is significantly associated with lactose-digester status, and in vitro functional tests confirm that the −14009G allele also increases expression of an LCT promoter construct. The derived alleles in the LCT enhancer region are spread through several ethnic groups, and we report a greater genetic diversity in lactose digesters than in nondigesters. By examining flanking markers to control for the effects of mutation and demography, we further describe, from empirical evidence, the signature of a soft selective sweep.

Open Access : Link 

Wednesday, July 31, 2013

A summary of interesting recent genetics papers.

I'm taking a break from my Summer break to post a few interesting papers that have come out within the past couple of months.

This paper supports such a notion of continuous gene-flow between Africans and non-Africans since the major Out of Africa event that was precursor to the populating of all continents outside of Africa.
To be sure, such a notion is not new but has been highlighted before by methods used by authors such as Li and Durbin (2011) for instance. Such a notion, is also sufficient to explain the intermediate genetic nature of West Eurasians, I.e between Africans and East Asian/Native Americans, that I have blogged about and demonstrated using ADMIXTURE in the past.

A few quotes from the paper:

"In this paper, we study the length distribution of tracts of identity by state (IBS), which are the gaps between pairwise differences in an alignment of two DNA sequences. These tract lengths contain information about the amount of genetic diversity that existed at various times in the history of a species and can therefore be used to estimate past population sizes. IBS tracts shared between DNA sequences from different populations also contain information about population divergence and past gene flow. By looking at IBS tracts shared within Africans and Europeans, as well as between the two groups, we infer that the two groups diverged in a complex way over more than 40,000 years, exchanging DNA as recently as 12,000 years ago." 

"To illustrate the power of our method, we use it to infer a joint history of Europeans and Africans from the high coverage 1000 Genomes trio parents. Previous analyses agree that Europeans experienced an out-of-Africa bottleneck and recent population growth, but other aspects of the divergence are contested [47]. In one analysis, Li and Durbin separately estimate population histories of Europeans, Asians, and Africans and observe that the African and non-African histories begin to look different from each other about 100,000–120,000 years ago; at the same time, they argue that substantial migration between Africa and Eurasia occurred as recently as 20,000 years ago and that the out-of-Africa bottleneck occurred near the end of the migration period, about 20,000–40,000 years ago. In contrast, Gronau, et al. use a likelihood analysis of many short loci to infer a Eurasian-African split that is recent enough (50 kya) to coincide with the start of the out of Africa bottleneck, detecting no evidence of recent gene flow between Africans and non-Africans [14]. The older Schaffner, et al. demographic model contains no recent European-African gene flow either [48], but Gutenkunst,et al. and Gravel, et al. use SFS data to infer divergence times and gene flow levels that are intermediate between these two extremes [22][49]. We aim to contribute to this discourse by using IBS tract lengths to study the same class of complex demographic models employed by Gutenkunst, et al. and Gronau, et al., models that have only been previously used to study allele frequencies and short haplotypes that are assumed not to recombine. Our method is the first to use these models in conjunction with haplotype-sharing information similar to what is used by the PSMC and other coalescent HMMs, fitting complex, high-resolution demographic models to an equally high-resolution summary of genetic data."

"We estimate that the European-African divergence occurred 55 kya and that gene flow continued until 13 kya. About 5.8% of European genetic material is derived from a ghost population that diverged 420 kya from the ancestors of modern humans. The out-of-Africa bottleneck period, where the European effective population size is only 1,530, lasts until 5.9 kya."

"Our inferred human history mirrors several controversial features of the history inferred by Li and Durbin from whole genome sequence data: a post-divergence African population size reduction, a sustained period of gene flow between Europeans and Yorubans, and a “bump” period when the ancestral human population size increased and then decreased again. Unlike Li and Durbin, we do not infer that either population increased in size between 30 and 100 kya. Li and Durbin postulate that this size increase might reflect admixture between the two populations rather than a true increase in effective population size; since our method is able to model this gene flow directly, it makes sense that no size increase is necessary to fit the data. In contrast, it is possible that the size increase we infer between 240 kya and 480 kya is a signature of gene flow among ancestral hominids."

"Our estimated divergence time of 55 kya is very close to estimates published by Gravel, et al.and Gronau, et al., who use very different methods but similar estimated mutation rates to the  per site per generation that we use in this paper. However, recent studies of de novo mutation in trios have shown that the mutation rate may be closer to  per site per generation [51][55][56]. We would estimate older divergence and gene flow times (perhaps  times older) if we used the lower, more recently estimated mutation rate. This is because the lengths of the longest IBS tracts shared between populations should be approximately exponentially distributed with decay rate ."

This paper discusses some points, rather the lack of evidence, that makes a pre-toba migration of modern humans outside of Africa almost impossible to reconcile with currently available evidence.

A few quotes from the paper:

"There are currently two sharply conflicting models for the earliest modern human colonization of South Asia, with radically different implications for the interpretation of the associated genetic and archaeological evidence (Fig. 1). The first is that modern humans arrived ∼50–60 ka, as part of a generalized Eurasian dispersal of anatomically modern humans, which spread (initially as a very small group) from a region of eastern Africa across the mouth of the Red Sea and expanded rapidly around the coastlines of southern and Southeast Asia, to reach Australia by ∼45–50 ka (7–10, 14–18) (Fig. 2). The second, more recently proposed view, is that there was a much earlier dispersal of modern humans from Africa sometime before 74 ka (and conceivably as early as 120–130ka), reaching southern Asia before the time of the volcanic “supereruption” of Mount Toba in Sumatra (the largest volcanic eruption of the past 2 million y) at ∼74 ka (1–6)."
"We find no evidence, either genetic or archaeological, for a very early modern human colonization of South Asia, before the Toba eruption. All of the available evidence supports a much later colonization beginning ∼50–55 ka, carrying mitochondrial L3 and Y chromosome C, D, and F lineages from eastern Africa, along with the Howiesons Poort-like microlithic technologies (see above and Genetics and Archaeology). We see no reason to believe that the initial modern human colonization of South and Southeast Asia was distinct from the process that is now well documented for effectively all of the other regions of Eurasia from ∼60 ka onward, even if the technological associations of these expanding populations differed (most probably for environmental reasons) between the eastern and northwestern ranges of the geographical dispersal routes."

"The archaeological evidence initially advanced to support an earlier (pre-Toba) dispersal of African-derived populations to southern Asia has since been withdrawn by the author responsible for the original lithic analyses, who now suggests that they are most likely “the work of an unidentified population of archaic people” (ref. 11, p. 26). Meanwhile, the genetic evidence outlined earlier indicates that any populations dispersing from Africa before 74 ka would predate the emergence of the mtDNA L3 haplogroup, the source for all known, extant maternal lineages in Eurasia (8, 28) (Fig. 5). The size of the mtDNA database is very substantial: currently there are almost 13,000 complete non-African mtDNA genomes available, not one of which is pre-L3."

This paper, written by a geneaolgoical community member, has made an impressive effort at creating and automating a comprehensive method to pylogenetically classify Geno 2.0 YDNA SNPs. Details of the algorithm are not available:

"To illustrate this, the author has used this Y-tree clade predictor (using the latest ISOGG tree as a basis for comparison) to classify over 1650 sets of publicly accessible Geno 2.0 Y-SNP calls. This information was then used as an input into another algorithm designed by the author – an algorithm developed to automate the construction of a phylogenetic Y-tree, while overcoming the challenges identified above. The technical details of this process will remain proprietary for the time being."

Tuesday, May 21, 2013

Development of Middle Stone Age innovation linked to rapid climate change

The development of modernity in early human populations has been linked to pulsed phases of technological and behavioural innovation within the Middle Stone Age of South Africa. However, the trigger for these intermittent pulses of technological innovation is an enigma. Here we show that, contrary to some previous studies, the occurrence of innovation was tightly linked to abrupt climate change. Major innovational pulses occurred at times when South African climate changed rapidly towards more humid conditions, while northern sub-Saharan Africa experienced widespread droughts, as the Northern Hemisphere entered phases of extreme cooling. These millennial-scale teleconnections resulted from the bipolar seesaw behaviour of the Atlantic Ocean related to changes in the ocean circulation. These conditions led to humid pulses in South Africa and potentially to the creation of favourable environmental conditions. This strongly implies that innovational pulses of early modern human behaviour were climatically influenced and linked to the adoption of refugia.

From the Press:

Rapid climate change during the Middle Stone Age, between 80,000 and 40,000 years ago, during the Middle Stone Age, sparked surges in cultural innovation in early modern human populations, according to new research.

Professor Ian Hall, Cardiff University School of Earth and Ocean Sciences, said: "When the timing of these rapidly occurring wet pulses was compared with the archaeological datasets, we found remarkable coincidences.
"The occurrence of several major Middle Stone Age industries fell tightly together with the onset of periods with increased rainfall."
"Similarly, the disappearance of the industries appears to coincide with the transition to drier climatic conditions."
Professor Chris Stringer of London's  commented "The correspondence between climatic ameliorations and cultural innovations supports the view that population growth fuelled cultural changes, through increased human interactions."
The South African archaeological record is so important because it shows some of the oldest evidence for modern behavior in early humans. This includes the use of symbols, which has been linked to the development of complex language, and personal adornments made of seashells.

Read more at:

The climate of South Africa was once much wetter than it is today, and those lush times may have spurred human populations through especially innovative periods, new research shows.

Evidence from these ancient periods suggests humans produced new tools, and used symbolism in wall engravings. The findings suggest a tight link between abrupt climate changes and the emergence of modern human traits, researchers say.

"We provide for the first time really good evidence that the occurrence and disappearance of these first finds of human innovation are linked to climate change," said study author Martin Ziegler, an earth science researcher at Cardiff University in Wales.

Before these periods of innovation, humans were quite primitive, with the most impressive technology being hand axes, Ziegler said. But during these wet periods, more advanced stone and bone tools appear in the fossil record, as well as painted symbols on cave walls that suggest the development of language.

Archaeologists have also found some of the first evidence of constructed plant beds during these periods, and shells thought to be worn as adornments or jewelry, Ziegler said. Among the most important periods analyzed in the study date to 71,000, and a period between 64,000 and 59,000 years ago.

Read more at:

Wednesday, May 8, 2013

Another Extensive thesis on East African DNA

It was brought to my attention last week, thanks to a comment on this blog made by the user 'Umi', that another thesis on East African DNA variation was publicly available online:

Complex Genetic History of East African Human Populations

This is also an extensive thesis with a wealth of information akin to Plaster's thesis, the primary differences being that this one was more focused on parts of East Africa that are found further to the South of Ethiopia, and in addition to uni-parental analysis, it also included some Autosomal model-based inference, albeit of quite low resolution in today's standards; 848 microsattelites and 479 indels (refer to Tishkoff et al. 2009 for marker details).

Due to the extensive nature of the report I haven't had a chance to cover its entire scope, instead, for starters, I have first focused on the YDNA data by creating a relative frequency chart from the results reported in Fig. 3.3.2. 

Several things to initially point out here,

  • The report outlines the discovery of 4 new SNPs, TL1-4. The first two were found in Haplogroup B and downstream from B-M150 and B-M112 respectively. The last two, TL3 and TL4, were found in haplogroup E and downstream from E-U174 and E-V32 respectively. Incidentally, the fourth SNP that is under E-V32, TL4, could potentially be the same as Z808/Z809 as identified recently by the geneological community, however, as the report does not give the Y-Chromosome location of the SNP in a NCBI Build 36/37 format, this can not be verified, at least by me, at the moment.
  • A couple of the frequency results in Fig. 3.3.2 do not add up, in particular, the frequency results for the Boni and the Baggara, but also to a lesser extent for the Kanuri and Teita.  I have labeled the missing frequency results with a “?” in the relative charts for those specific populations.
  • The Burji and Konso are labeled as being only from Kenya throughout the report, however most Burji are from Ethiopia, and the Konso are exclusively found in Ethiopia, I have reflected this in the charts.
  • STR data is not readily available to perform TMRCA estimates on, however, some TMRCA results are reported using Zhivotovsky's rates in Table 3.3.1, nevertheless, these are estimates only for different lineages found in the dataset for all the samples and not necessarily comparing TMRCAs in the different populations under study.
  • J-M62, while a subclade of J-M267, is not the main subclade of J-M267 found in East Africa, that would be J-P58, therefore, the results for J-12f2.1 (x M62, M172) reported, may after all be, or largely include, J-P58 lineages, off-course those results could also include variants of J-M267 other than J-P58 and J-M62 as well since the SNP was not directly tested. 
  • E-P2* lineages are abundantly found (> 30%) in the Konso, Burji and Mbugwe, however on closer examination and correlation with current data, these could be E-M329, E-V38* or even E-M215*, as none of these SNPs were directly tested. Genuine E-P2* lineages would be positive for E-P2 and negative for V38 and M215 (See Trombetta et al. 2011)
  • Similarly, the E-M35* lineages reported could be members of relatively newly discovered lineages of E-Z830*( See this post for details), or some of the untested variantes of E-M35, i.e.  E-V42, V92 and maybe even E-V68 (x M78)

Tuesday, May 7, 2013

Analyzing YDNA A-M13 lineages in Ethiopian linguistic groups

Similar to the previous analysis of J lineages found in Ethiopia from the Plaster paper, the other prevalent lineage in Ethiopia, A-M13 (formerly known also as A3b2), is also analyzed below. A total of 616 A-M13 lineages were reported in the study, of which ~32% were classified as Semitic speakers, ~40% as Cushitic speakers, ~17% as Omotic speakers and the remainder within the Nilo-Saharan speaking macro-phylum.