It is reported that Zhivotovsky's
effective mutation rate
[1]
has the effect of increasing the TMRCA of a lineage, as computed by
the use of Microsattelite Genetic
Distances[2],
by a factor of 3-4 fold relative to TMRCAs computed via mutation
rates observed in pedigree and family studies
[3].
By utilizing my TMRCA calculating program,
I want to explore,
- What effect does different marker combinations have on this multiplier ?
- What effect does marker size have on this multiplier ?
- Is there a variation in this multiplier for different data-sets?
First, to ensure that my program correctly calculates the TMRCA when the
Zhivotovsky mutation rate of 0.00069 is applied to all the markers
in my database consistently (versus only the marker specific Pedigree
mutation rates I have thus far been utilizing), I attempted
to replicate the TMRCA computations of the following publication;
The emergence of Y-chromosome haplogroup J1e among Arabic-speaking populations Chiaroni et al. (2009)
One reason I chose this article, in
addition to it using Zhivotovsky's effective mutation rate, is
because it had the most complete STR profiles supplied
along with the paper. The article states the following:
The following eight loci, DYS19,
DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393 and DYS439, were
used to estimate expansion times using the methodology described by
Zhivotovsky et al.21
as modified according to Sengupta et al. A microsatellite
evolutionary effective mutation rate of 6.9 × 10–4 was
used.
All the eight loci that are mentioned
above are available in my program's database, in addition, the
modification with respect to Sengupta et al. it is referring to is
(1) using the Median repeats for the ancestral haplotype instead of
Modal repeats, which is fine since my program computes for both
scenarios separately (2) Modification in the computation of the
lower/upper bound estimates, which is not necessary for my particular
case here, as I am only interested in the central estimates of TMRCA
(at least for now, although it would be interesting to see what
impact would be seen when the upper/lower bound are estimated
according to effective vs. pedigree rates).
The full Chiraoni et al.(2009) J-P58
STR data can be downloaded from the “Supplementary info section”
After importing the data and tweaking
the following 17 haplotypes that had contained null values for one or
more markers: [xEJ_C5, J1_E1, J1_H5, st_2164, st_2149, J1_C2,
Tanta100, 2, 44, 59, 82, 115, 117, 147, 148, 158, 170] by replacing
the missing ones with the modal repeats for the entire data-set, I
ended up with 453 J-P58 haplotypes, same as reported in Table 1. When
I ran these haplotypes in my program I get;
Dataset:J1e_Chiaroni2
Marker list:8_Chiaronimarkerlist
Sample size:453
Pedigree/Familial Rates Summary
Years/Generation:28 - 33
TMRCA Range:2648 - 4057
Mean TMRCA:3382
Median TMRCA:3368
SD:444
Coalescent_Detail =
{
[1,1] = Chandler;8 Markers
Generations(Median)--122.94 Generations(Modal)--122.94
[1,2] = Stafford;8 Markers
Generations(Median)--108.45 Generations(Modal)--108.45
[1,3] = Burgarella_Navascues;8
Markers Generations(Median)--117.65 Generations(Modal)--117.65
[1,4] = Ballantyne;8 Markers
Generations(Median)--94.601 Generations(Modal)--94.601
[1,5] = Zhivotovsky;8 Markers
Generations(Median)--401.11 Generations(Modal)--401.11
}
Since Zhivotovsky is using 25
Years/Generation , this would mean 401.11 X 25 = 10,027.75 years ago
for the TMRCA of the J-P58 haplotypes according to my program, the
study reports a central estimate of 10, 100 years ago for the same
haplotypes, thus my program is correct within 0.7% or less than 3
generations, part of the error could be with how I tweaked those 17
haplotypes that I mentioned above.
To double check, I retested my
program with the J1e-YCAII 22/22 haplotypes (a subset of the J-P58
haplotypes from above), however, I could only retrieve 202 of these haplotypes,
versus the 203 they are reporting in Table 1. In any event, for
those particular haplotypes, I get the following results:
Dataset:J1e_Chiaroni2_YCAII22_22
Marker list:8_Chiaronimarkerlist
Sample size:202
Pedigree/Familial Rates Summary
Years/Generation:28 - 33
TMRCA Range:2260 - 3574
Mean TMRCA:2921
Median TMRCA:2946
SD:415
Coalescent_Detail =
{
[1,1] = Chandler;8 Markers
Generations(Median)--108.31 Generations(Modal)--108.31
[1,2] = Stafford;8 Markers
Generations(Median)--91.632 Generations(Modal)--91.632
[1,3] = Burgarella_Navascues;8
Markers Generations(Median)--102.45 Generations(Modal)--102.45
[1,4] = Ballantyne;8 Markers
Generations(Median)--80.719 Generations(Modal)--80.719
[1,5] = Zhivotovsky;8 Markers
Generations(Median)--365.91 Generations(Modal)--365.91
}
Again, for 25 years/generation, this
comes out to 365.91 X 25 = 9,147.75, which is off by 0.6% or slightly
more than 2 generations from the central estimate that they report for those same haplotypes in Table 1.
I additionally also used the 8
markers from the publication above to compute TMRCAs for E1b1b and J-P58 STR data from FTDNA to get the
following results:
Dataset:J1c3
Marker list:8_Chiaronimarkerlist
Sample size:256
Pedigree/Familial Rates Summary
Years/Generation:28 - 33
TMRCA Range:3636 - 5604
Mean TMRCA:4810
Median TMRCA:4752
SD:659
Coalescent_Detail =
{
[1,1] = Chandler;8 Markers
Generations(Median)--169.6 Generations(Modal)--169.6
[1,2] = Stafford;8 Markers
Generations(Median)--161.59 Generations(Modal)--161.59
[1,3] = Burgarella_Navascues;8
Markers Generations(Median)--169.85 Generations(Modal)--169.85
[1,4] = Ballantyne;8 Markers
Generations(Median)--129.86 Generations(Modal)--129.86
[1,5] = Zhivotovsky;8 Markers
Generations(Median)--467.05 Generations(Modal)--467.05
}
Dataset:EM35-Balanced
Marker list:8_Chiaronimarkerlist
Sample size:180
Pedigree/Familial Rates Summary
Years/Generation:28 - 33
TMRCA Range:5678 - 8101
Mean TMRCA:6713
Median TMRCA:6782
SD:756
Coalescent_Detail =
{
[1,1] = Chandler;8 Markers
Generations(Median)--245.49 Generations(Modal)--245.49
[1,2] = Stafford;8 Markers
Generations(Median)--219.22 Generations(Modal)--219.22
[1,3] = Burgarella_Navascues;8
Markers Generations(Median)--212.94 Generations(Modal)--212.94
[1,4] = Ballantyne;8 Markers
Generations(Median)--202.79 Generations(Modal)--202.79
[1,5] = Zhivotovsky;8 Markers
Generations(Median)--801.13 Generations(Modal)--801.13
}
Above, we can see that while the FTDNA J-P58
data-set seems older by ~66 generations (for the Zhivotovsky rates)
than the Chiaroni (2009) dataset, it is interesting to note that the
E1b1b1 dataset's results of 801 generations or 20.028 KYA is quite
close to the lower bound of Cruciani (2007)'s TMRCA estimate of
20.9-23.9 KYA.
Now that I have verified the program
works reasonably well with the effective mutation rate that I added to the database, I can now test what the effect of marker combination/size has, on not only the
absolute TMRCA estimates, but on the Zhivotovsky multiplier as well.
To do this, I utilized 2 of the FTDNA datasets I used above for E-M35 and J-P58. Since I have a maximum of 49 markers to work with, I simply took a random subset of those 49 markers for computation of both the Zhivotovsky and the Pedigree coalescent estimates, since the main purpose is to find the effect of marker combination, I then repeat the process of extracting random combinations from within the super-set of 49 markers and performing the computations, thus, I repeated the process of extracting said combinations a total of 50 times per chosen marker size.
To do this, I utilized 2 of the FTDNA datasets I used above for E-M35 and J-P58. Since I have a maximum of 49 markers to work with, I simply took a random subset of those 49 markers for computation of both the Zhivotovsky and the Pedigree coalescent estimates, since the main purpose is to find the effect of marker combination, I then repeat the process of extracting random combinations from within the super-set of 49 markers and performing the computations, thus, I repeated the process of extracting said combinations a total of 50 times per chosen marker size.
For instance, if I choose my first
marker size to be 8, I would then compute TMRCAs for 50 separate random 8
marker combinations within the dataset. Then, I double the marker
size and compute TMRCAs for another 50 random combinations and so
forth, until I reach my marker size limit.
Starting with a marker size of 8, and then going to 16, 32 and 40, here below are the results for the E-M35 and
J-P58 datasets with 50 random combinations for each marker size.
Note that in the tables below, Z-TMRCA,
denotes the TMRCA in generations obtained using the effective
mutation rate while P-TMRCA denotes an average of the TMRCAs
obtained from the 4 separate sources that utilize the pedigree rates
in my program. The X Max, X Min, X Average and X SD, denote
respectively the maximum, minimum, average and standard deviation of
the Zhivotovsky multiple found for each combination run. All columns of the
tables are sortable.
8 Marker – E-M35_FTDNA
16 Marker – E-M35_FTDNA
32 Marker – E-M35_FTDNA
40 Marker – E-M35_FTDNA
8 Marker – J-P58_FTDNA
16 Marker – J-P58_FTDNA
32 Marker – J-P58_FTDNA
40 Marker – J-P58_FTDNA
From the tables above, it is clear that the Zhivotovsky multiple has wide ranges depending on the marker combination (as well as which pedigree rates are used), but it can also be seen that the ranges narrow down when marker size is increased.
For instance, for the E-M35 dataset with a marker size of 8, the minimum multiple is 0.71 (below parity !) and is found at Combo # 6, while the maximum multiple is 9.67 found at Combo # 34, this gives a multiple range of 8.96, for the same dataset, but however for a marker size of 40, the minimum multiple is 2.15 (Combo # 29) while the maximum multiple is 5.41 (Combo # 28), giving a multiple range of 3.26, which is almost a 64% reduction in the range of the multiple going from the smallest to the largest marker size. A similar pattern of a range reduction in the multiple can also be seen with the other dataset (J-P58).
The above could be explained by either 1 of 2 reasons that need further testing, (1) As the marker size increases the pool of newly available markers to combine randomly also gets limited, as there are ultimately only 49 markers to work with, thus the results become more of a repetition or (2) Marker size increase really does have a significant effect on the range of the multiple.
Whatever the case, it is clear that
marker combination has a tremendous effect on the Zhivotovsky
multiple, more especially if the markers used are low in size, therefore, the 'mantra' of multiply by 3 (or divide by 3) while true on
average, is a gross oversimplification, or even completely wrong, if the particular combination of marker properties along with the particular set of pedigree mutation rates used are not accounted for in detail.
Although more data-points than only the
8, 16, 32 and 40 marker sizes that I used would certainly be ideal,
there was no particular correlation between the marker sizes and the
absolute TMRCA's generated for both the Pedigree and Zhivotovsky
rates.
The E-M35 data-set had an overall mean (for all marker sizes and random combinations) for the Z-TMRCA of 1456 generations with an SD of 70.28, and R^2 of 0.09, while the P-TMRCA had 401, 4.73 and 0.92 respectively for the same variables. The J-P58 dataset had 726, 33.75 and 0.62 respectively for the Z-TMRCA and 235, 16.91 and 0.36 for the P-TMRCA. (see also below)
The E-M35 data-set had an overall mean (for all marker sizes and random combinations) for the Z-TMRCA of 1456 generations with an SD of 70.28, and R^2 of 0.09, while the P-TMRCA had 401, 4.73 and 0.92 respectively for the same variables. The J-P58 dataset had 726, 33.75 and 0.62 respectively for the Z-TMRCA and 235, 16.91 and 0.36 for the P-TMRCA. (see also below)
This may be some data you would like to add if you havent already seen it.
ReplyDeletehttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2869035/
As a matter of fact I had never seen this data before, I ran the J-M267 haplotypes of these Kuwaiti Bedouins through my calculator using the Chiaroni 8 marker list from above:
DeleteDataset:Bedouin_JM267
Marker list:8_Chiaronimarkerlist
Sample size:79
Pedigree/Familial Rates Summary
Years/Generation:28 - 33
TMRCA Range:3719 - 6540
Mean TMRCA:5085
Median TMRCA:5106
SD:859
Coalescent_Detail =
{
[1,1] = Chandler;8 Markers; Generations(Median)--198.2 Generations(Modal)--198.2
[1,2] = Stafford;8 Markers; Generations(Median)--161.7 Generations(Modal)--161.7
[1,3] = Burgarella_Navascues;8 Markers; Generations(Median)--174.16 Generations(Modal)--174.16
[1,4] = Ballantyne;8 Markers; Generations(Median)--132.84 Generations(Modal)--132.84
[1,5] = Zhivotovsky;8 Markers; Generations(Median)--609.98 Generations(Modal)--609.98
}
These haplotypes definitely seem older than both the Chiaroni J-P58 haplotypes and the FTDNA J-P58 haplotypes by a factor of up-to 50%, that is strange, one would think that they'd show a rather limited age due to inbreeding....
Actually I was comparing branches (J-M267) to twigs (J-P58), when I should be comparing the bedouin J-M267 to FTDNA J-M267,
DeleteDataset:J-M267 (ftdna)
Marker list:8_Chiaronimarkerlist
Sample size:573
Pedigree/Familial Rates Summary
Years/Generation:28 - 33
TMRCA Range:3898 - 6239
Mean TMRCA:5271
Median TMRCA:5249
SD:771
Coalescent_Detail =
{
[1,1] = Chandler;8 Markers; Generations(Median)--189.09 Generations(Modal)--189.09
[1,2] = Stafford;8 Markers; Generations(Median)--177.23 Generations(Modal)--177.23
[1,3] = Burgarella_Navascues;8 Markers; Generations(Median)--185.87 Generations(Modal)--185.87
[1,4] = Ballantyne;8 Markers; Generations(Median)--139.21 Generations(Modal)--139.21
[1,5] = Zhivotovsky;8 Markers; Generations(Median)--508.7 Generations(Modal)--508.7
}
Which means that the bedouin's data is older than FTDNA's but not as old as I had thought earlier, still what this tells me is that the Bedouin's data, likely holds more lineages within J-M267 than just J-P58 only.