Ethio Helix ኢትዮ:ሒሊክስ: The Zhivotovsky Multiplier

It is reported that Zhivotovsky's effective mutation rate [1] has the effect of increasing the TMRCA of a lineage, as computed by the use of Microsattelite Genetic Distances[2], by a factor of 3-4 fold relative to TMRCAs computed via mutation rates observed in pedigree and family studies [3].

By utilizing my TMRCA calculating program, I want to explore,

What effect does different marker combinations have on this multiplier ?
What effect does marker size have on this multiplier ?
Is there a variation in this multiplier for different data-sets?

First, to ensure that my program correctly calculates the TMRCA when the Zhivotovsky mutation rate of 0.00069 is applied to all the markers in my database consistently (versus only the marker specific Pedigree mutation rates I have thus far been utilizing), I attempted to replicate the TMRCA computations of the following publication;

The emergence of Y-chromosome haplogroup J1e among Arabic-speaking populations Chiaroni et al. (2009)

One reason I chose this article, in addition to it using Zhivotovsky's effective mutation rate, is because it had the most complete STR profiles supplied along with the paper. The article states the following:

The following eight loci, DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393 and DYS439, were used to estimate expansion times using the methodology described by Zhivotovsky et al.²¹ as modified according to Sengupta et al. A microsatellite evolutionary effective mutation rate of 6.9 × 10^–4 was used.

All the eight loci that are mentioned above are available in my program's database, in addition, the modification with respect to Sengupta et al. it is referring to is (1) using the Median repeats for the ancestral haplotype instead of Modal repeats, which is fine since my program computes for both scenarios separately (2) Modification in the computation of the lower/upper bound estimates, which is not necessary for my particular case here, as I am only interested in the central estimates of TMRCA (at least for now, although it would be interesting to see what impact would be seen when the upper/lower bound are estimated according to effective vs. pedigree rates).

The full Chiraoni et al.(2009) J-P58 STR data can be downloaded from the “Supplementary info section”

After importing the data and tweaking the following 17 haplotypes that had contained null values for one or more markers: [xEJ_C5, J1_E1, J1_H5, st_2164, st_2149, J1_C2, Tanta100, 2, 44, 59, 82, 115, 117, 147, 148, 158, 170] by replacing the missing ones with the modal repeats for the entire data-set, I ended up with 453 J-P58 haplotypes, same as reported in Table 1. When I ran these haplotypes in my program I get;

Dataset:J1e_Chiaroni2

Marker list:8_Chiaronimarkerlist

Sample size:453

Pedigree/Familial Rates Summary

Years/Generation:28 - 33

TMRCA Range:2648 - 4057

Mean TMRCA:3382

Median TMRCA:3368

SD:444

Coalescent_Detail =

{

[1,1] = Chandler;8 Markers Generations(Median)--122.94 Generations(Modal)--122.94

[1,2] = Stafford;8 Markers Generations(Median)--108.45 Generations(Modal)--108.45

[1,3] = Burgarella_Navascues;8 Markers Generations(Median)--117.65 Generations(Modal)--117.65

[1,4] = Ballantyne;8 Markers Generations(Median)--94.601 Generations(Modal)--94.601

[1,5] = Zhivotovsky;8 Markers Generations(Median)--401.11 Generations(Modal)--401.11

}

Since Zhivotovsky is using 25 Years/Generation , this would mean 401.11 X 25 = 10,027.75 years ago for the TMRCA of the J-P58 haplotypes according to my program, the study reports a central estimate of 10, 100 years ago for the same haplotypes, thus my program is correct within 0.7% or less than 3 generations, part of the error could be with how I tweaked those 17 haplotypes that I mentioned above.

To double check, I retested my program with the J1e-YCAII 22/22 haplotypes (a subset of the J-P58 haplotypes from above), however, I could only retrieve 202 of these haplotypes, versus the 203 they are reporting in Table 1. In any event, for those particular haplotypes, I get the following results:

Dataset:J1e_Chiaroni2_YCAII22_22

Marker list:8_Chiaronimarkerlist

Sample size:202

Pedigree/Familial Rates Summary

Years/Generation:28 - 33

TMRCA Range:2260 - 3574

Mean TMRCA:2921

Median TMRCA:2946

SD:415

Coalescent_Detail =

{

[1,1] = Chandler;8 Markers Generations(Median)--108.31 Generations(Modal)--108.31

[1,2] = Stafford;8 Markers Generations(Median)--91.632 Generations(Modal)--91.632

[1,3] = Burgarella_Navascues;8 Markers Generations(Median)--102.45 Generations(Modal)--102.45

[1,4] = Ballantyne;8 Markers Generations(Median)--80.719 Generations(Modal)--80.719

[1,5] = Zhivotovsky;8 Markers Generations(Median)--365.91 Generations(Modal)--365.91

}

Again, for 25 years/generation, this comes out to 365.91 X 25 = 9,147.75, which is off by 0.6% or slightly more than 2 generations from the central estimate that they report for those same haplotypes in Table 1.

I additionally also used the 8 markers from the publication above to compute TMRCAs for E1b1b and J-P58 STR data from FTDNA to get the following results:

Dataset:J1c3

Marker list:8_Chiaronimarkerlist

Sample size:256

Pedigree/Familial Rates Summary

Years/Generation:28 - 33

TMRCA Range:3636 - 5604

Mean TMRCA:4810

Median TMRCA:4752

SD:659

Coalescent_Detail =

{

[1,1] = Chandler;8 Markers Generations(Median)--169.6 Generations(Modal)--169.6

[1,2] = Stafford;8 Markers Generations(Median)--161.59 Generations(Modal)--161.59

[1,3] = Burgarella_Navascues;8 Markers Generations(Median)--169.85 Generations(Modal)--169.85

[1,4] = Ballantyne;8 Markers Generations(Median)--129.86 Generations(Modal)--129.86

[1,5] = Zhivotovsky;8 Markers Generations(Median)--467.05 Generations(Modal)--467.05

}

Dataset:EM35-Balanced

Marker list:8_Chiaronimarkerlist

Sample size:180

Pedigree/Familial Rates Summary

Years/Generation:28 - 33

TMRCA Range:5678 - 8101

Mean TMRCA:6713

Median TMRCA:6782

SD:756

Coalescent_Detail =

{

[1,1] = Chandler;8 Markers Generations(Median)--245.49 Generations(Modal)--245.49

[1,2] = Stafford;8 Markers Generations(Median)--219.22 Generations(Modal)--219.22

[1,3] = Burgarella_Navascues;8 Markers Generations(Median)--212.94 Generations(Modal)--212.94

[1,4] = Ballantyne;8 Markers Generations(Median)--202.79 Generations(Modal)--202.79

[1,5] = Zhivotovsky;8 Markers Generations(Median)--801.13 Generations(Modal)--801.13

}

Above, we can see that while the FTDNA J-P58 data-set seems older by ~66 generations (for the Zhivotovsky rates) than the Chiaroni (2009) dataset, it is interesting to note that the E1b1b1 dataset's results of 801 generations or 20.028 KYA is quite close to the lower bound of Cruciani (2007)'s TMRCA estimate of 20.9-23.9 KYA.

Now that I have verified the program works reasonably well with the effective mutation rate that I added to the database, I can now test what the effect of marker combination/size has, on not only the absolute TMRCA estimates, but on the Zhivotovsky multiplier as well.

To do this, I utilized 2 of the FTDNA datasets I used above for E-M35 and J-P58. Since I have a maximum of 49 markers to work with, I simply took a random subset of those 49 markers for computation of both the Zhivotovsky and the Pedigree coalescent estimates, since the main purpose is to find the effect of marker combination, I then repeat the process of extracting random combinations from within the super-set of 49 markers and performing the computations, thus, I repeated the process of extracting said combinations a total of 50 times per chosen marker size.

For instance, if I choose my first marker size to be 8, I would then compute TMRCAs for 50 separate random 8 marker combinations within the dataset. Then, I double the marker size and compute TMRCAs for another 50 random combinations and so forth, until I reach my marker size limit.

Starting with a marker size of 8, and then going to 16, 32 and 40, here below are the results for the E-M35 and J-P58 datasets with 50 random combinations for each marker size.

Note that in the tables below, Z-TMRCA, denotes the TMRCA in generations obtained using the effective mutation rate while P-TMRCA denotes an average of the TMRCAs obtained from the 4 separate sources that utilize the pedigree rates in my program. The X Max, X Min, X Average and X SD, denote respectively the maximum, minimum, average and standard deviation of the Zhivotovsky multiple found for each combination run. All columns of the tables are sortable.

8 Marker – E-M35_FTDNA

16 Marker – E-M35_FTDNA

32 Marker – E-M35_FTDNA

40 Marker – E-M35_FTDNA

8 Marker – J-P58_FTDNA

16 Marker – J-P58_FTDNA

32 Marker – J-P58_FTDNA

40 Marker – J-P58_FTDNA

From the tables above, it is clear that the Zhivotovsky multiple has wide ranges depending on the marker combination (as well as which pedigree rates are used), but it can also be seen that the ranges narrow down when marker size is increased.

For instance, for the E-M35 dataset with a marker size of 8, the minimum multiple is 0.71 (below parity !) and is found at Combo # 6, while the maximum multiple is 9.67 found at Combo # 34, this gives a multiple range of 8.96, for the same dataset, but however for a marker size of 40, the minimum multiple is 2.15 (Combo # 29) while the maximum multiple is 5.41 (Combo # 28), giving a multiple range of 3.26, which is almost a 64% reduction in the range of the multiple going from the smallest to the largest marker size. A similar pattern of a range reduction in the multiple can also be seen with the other dataset (J-P58).

The above could be explained by either 1 of 2 reasons that need further testing, (1) As the marker size increases the pool of newly available markers to combine randomly also gets limited, as there are ultimately only 49 markers to work with, thus the results become more of a repetition or (2) Marker size increase really does have a significant effect on the range of the multiple.

Whatever the case, it is clear that marker combination has a tremendous effect on the Zhivotovsky multiple, more especially if the markers used are low in size, therefore, the 'mantra' of multiply by 3 (or divide by 3) while true on average, is a gross oversimplification, or even completely wrong, if the particular combination of marker properties along with the particular set of pedigree mutation rates used are not accounted for in detail.

Although more data-points than only the 8, 16, 32 and 40 marker sizes that I used would certainly be ideal, there was no particular correlation between the marker sizes and the absolute TMRCA's generated for both the Pedigree and Zhivotovsky rates.

The E-M35 data-set had an overall mean (for all marker sizes and random combinations) for the Z-TMRCA of 1456 generations with an SD of 70.28, and R^2 of 0.09, while the P-TMRCA had 401, 4.73 and 0.92 respectively for the same variables. The J-P58 dataset had 726, 33.75 and 0.62 respectively for the Z-TMRCA and 235, 16.91 and 0.36 for the P-TMRCA. (see also below)

Ethio Helix ኢትዮ:ሒሊክስ

Pages

Thursday, February 21, 2013

The Zhivotovsky Multiplier

3 comments:

Blog Archive

Search This Blog

Contact Form

Ethio Helix ኢትዮ:ሒሊክስ

Pages

Thursday, February 21, 2013

The Zhivotovsky Multiplier

3 comments:

Blog Archive

Search This Blog

Subscribe To

Contact Form