http://ehelix.pythonanywhere.com/
It is also embedded in this blog in a new page (above)
UPDATE (02/11/2014)
Another series of updates for the calculator:
- User now able to utilize the previously idle first column in the csv file to group haplotypes together and thus compute the TMRCA for a specified group (see example below)
- The application now also accepts Locus names in NIST format as well.
- It also now automatically deletes any haplotype with a non-integer value given for any locus in the *.csv file. (instead of producing an error for that scenario)
To demonstrate the filtration utility, consider the YDNA E-V13 portion of the Bulgarian dataset from “Y-Chromosome Diversity in Modern Bulgarians: New Clues about Their Ancestry”, the *.csv file for this dataset can be downloaded from the link below as well :
http://ehelix.pythonanywhere.com/init/default/Example_Files.
When the file is uploaded, the application gives the option to analyze any subset of haplotypes with N>1 that is available, i.e in this case: Bulgaria/Central, Bulgaria/East and Bulgaria/West, in addition to all of the Haplotypes.
Picking each of the above subsets from the “Choose a filter” listbox and running them separately produces the following results:
****************************************************************************************************
Active Y-str file: Bulgaria_EV13.csv
Active Markers file: Full_marker_list.txt
Dataset: Bulgaria_EV13
Sample size: 49
****************************************************************************************************
*Marker Details*
8 requested markers included in analysis (41 excluded):-
Markers not found in the Dataset: ['458', '590', '578', '594', '450', '572', '557', '570', '454', '455', '456', '388', '490', '492', '641', '406s1', '472', '520', '426', '568', '449', '448', '438', '460', '442', '447', '565', '617', '436', '446', '487', '444', '481', '537', '640', '534', '576', '531', '511', '437', 'gatah4']
Markers used in Analysis: ['19', '393', '392', '391', '390', '439', '389-1', '389-2']
****************************************************************************************************
*Coalescent Details*
Ballantyne--Generations(Median)--74.17 Generations(Modal)--74.17
Burgarella_Navascues--Generations(Median)--89.57 Generations(Modal)--89.57
Chandler--Generations(Median)--101.78 Generations(Modal)--101.78
Stafford--Generations(Median)--87.07 Generations(Modal)--87.07
Zhivotovsky--Generations(Median)--317.95 Generations(Modal)--317.95
****************************************************************************************************
*Pedigree/Familial Rates Summary*
Years/Generation: 28 - 33
TMRCA Range: 2076 - 3358
Mean TMRCA: 2688
Median TMRCA: 2678
****************************************************************************************************
Active Y-str file: Bulgaria_EV13.csv
Active Markers file: Full_marker_list.txt
Dataset: Bulgaria_EV13, Filter = Bulgaria/Central
Sample size: 17
****************************************************************************************************
*Marker Details*
8 requested markers included in analysis (41 excluded):-
Markers not found in the Dataset: ['458', '590', '578', '594', '450', '572', '557', '570', '454', '455', '456', '388', '490', '492', '641', '406s1', '472', '520', '426', '568', '449', '448', '438', '460', '442', '447', '565', '617', '436', '446', '487', '444', '481', '537', '640', '534', '576', '531', '511', '437', 'gatah4']
Markers used in Analysis: ['19', '393', '392', '391', '390', '439', '389-1', '389-2']
****************************************************************************************************
*Coalescent Details*
Ballantyne--Generations(Median)--50.30 Generations(Modal)--50.30
Burgarella_Navascues--Generations(Median)--61.98 Generations(Modal)--61.98
Chandler--Generations(Median)--64.37 Generations(Modal)--64.37
Stafford--Generations(Median)--53.36 Generations(Modal)--53.36
Zhivotovsky--Generations(Median)--245.10 Generations(Modal)--245.10
****************************************************************************************************
*Pedigree/Familial Rates Summary*
Years/Generation: 28 - 33
TMRCA Range: 1408 - 2124
Mean TMRCA: 1753
Median TMRCA: 1748
****************************************************************************************************
Active Y-str file: Bulgaria_EV13.csv
Active Markers file: Full_marker_list.txt
Dataset: Bulgaria_EV13, Filter = Bulgaria/East
Sample size: 16
****************************************************************************************************
*Marker Details*
8 requested markers included in analysis (41 excluded):-
Markers not found in the Dataset: ['458', '590', '578', '594', '450', '572', '557', '570', '454', '455', '456', '388', '490', '492', '641', '406s1', '472', '520', '426', '568', '449', '448', '438', '460', '442', '447', '565', '617', '436', '446', '487', '444', '481', '537', '640', '534', '576', '531', '511', '437', 'gatah4']
Markers used in Analysis: ['19', '393', '392', '391', '390', '439', '389-1', '389-2']
****************************************************************************************************
*Coalescent Details*
Ballantyne--Generations(Median)--85.28 Generations(Modal)--85.28
Burgarella_Navascues--Generations(Median)--100.69 Generations(Modal)--100.69
Chandler--Generations(Median)--116.98 Generations(Modal)--116.98
Stafford--Generations(Median)--102.94 Generations(Modal)--102.94
Zhivotovsky--Generations(Median)--339.67 Generations(Modal)--339.67
****************************************************************************************************
*Pedigree/Familial Rates Summary*
Years/Generation: 28 - 33
TMRCA Range: 2387 - 3860
Mean TMRCA: 3094
Median TMRCA: 3078
****************************************************************************************************
Active Y-str file: Bulgaria_EV13.csv
Active Markers file: Full_marker_list.txt
Dataset: Bulgaria_EV13, Filter = Bulgaria/West
Sample size: 16
****************************************************************************************************
*Marker Details*
8 requested markers included in analysis (41 excluded):-
Markers not found in the Dataset: ['458', '590', '578', '594', '450', '572', '557', '570', '454', '455', '456', '388', '490', '492', '641', '406s1', '472', '520', '426', '568', '449', '448', '438', '460', '442', '447', '565', '617', '436', '446', '487', '444', '481', '537', '640', '534', '576', '531', '511', '437', 'gatah4']
Markers used in Analysis: ['19', '393', '392', '391', '390', '439', '389-1', '389-2']
****************************************************************************************************
*Coalescent Details*
Ballantyne--Generations(Median)--88.42 Generations(Modal)--88.42
Burgarella_Navascues--Generations(Median)--107.76 Generations(Modal)--107.76
Chandler--Generations(Median)--126.34 Generations(Modal)--126.34
Stafford--Generations(Median)--107.00 Generations(Modal)--107.00
Zhivotovsky--Generations(Median)--373.64 Generations(Modal)--373.64
****************************************************************************************************
*Pedigree/Familial Rates Summary*
Years/Generation: 28 - 33
TMRCA Range: 2475 - 4169
Mean TMRCA: 3275
Median TMRCA: 3274
Notice that if a filter is chosen, the applied filter's name will be appended to the Dataset field of the results with “Filter =”, if all haplotypes are requested on the other hand, the Dataset field will just contain the Dataset's name as specified by the user.
To double check if the ASD computation has been carried out correctly for the central TMRCA estimates, we can cross-check the result of the publication itself for the 3 regions (Table S7 in the supporting information), against the Zhivotvsky results computed by the app. (highlighted above) multiplied by 25 (for 25years/generation according to Zhivotovsky).
(Publication) app. Zhivotovsky results
Bulgaria/Central 6,100YA 6,127.5 YA
Bulgaria/East 8,400YA 8,491.75 YA
Bulgaria/West 9,300YA 9,341.00 YA
Close Enough!
UPDATE (03/16/2014)
It is now possible to copy and paste
FTDNA-type haplotype repeats into the app, instead of only being able
to upload csv files.
UPDATE (03/21/2014)
Added another 'mode' of analysis, the
'compare mode':
Compare
Mode
- If there is more than one unique subset in the dataset, then TMRCA
computations will be carried out on each unique subset
simultaneously, summarized results will then be printed for each
subset in a tabular format. A "Sample Size Threshold" can
be assigned in this mode if the user wants to require a minimum
sample size from each subset to be analyzed , if nothing is assigned
in this field, then the application will use N = 2 as the minimum
sample size. Note:
Slightly longer computation times will be required relative to
analysis carried out in single mode depending on sample size and the
total number of markers.
- The
first part establishes baseline information for the entire dataset
including the active STR and marker files, the sample size, DYS#'s
used and the mean TMRCAs.
- The
second part tabulates the results for each unique subset. The first
column of the table simply shows the name of the subset as
assigned in the filter column of the Y-STR file. The second column is for the sample size of the subset. The third column shows the
ratio of the number of haplotypes in the subset, relative to the total number of haplotypes in the entire dataset. The fourth column,
Z-TMRCA, shows the mean TMRCA in generations using the Zhivotovsky rates. The last column, P-TMRCA, shows the mean TMRCA
in generations using all the available pedigree rates. All of the columns are sortable in ascending or descending order.
- A
link is given in the event that the user needs to open the results
of the compare mode analysis in a separate tab. This maybe useful
if one wants to drill into a detailed analysis of any of the subsets found in the table using the single mode analysis, with out closing the
the results of the compare mode analysis for the active dataset.
To Demonstrate, with a total of N=1391
of 67 marker E1b1b haplotypes copied from the FTDNA public pages,
and using the 'Country' column as the filter.
For a threshold of N = 25 and the full marker list, sorted for descending Z-TMRCA:
For a threshold of N = 25 and the Zhivotovsky marker list, sorted for descending Z-TMRCA:
UPDATE (03/25/2014)
Instead of using the 0.00069 rate for all the markers in the
calculator that were not found in the Zhivotovsky publication, I have
normalized the rates for the markers using an average of all the
pedigree rates normalized with the effective rate's ratio , specific
procedure I used to do this can be seen in the spreadsheet below.
https://docs.google.com/spreadsheets/d/1D6FU4fpB6vwAnle2-oiEvCV7_50QQhdhk6irSSVQtDE/edit#gid=467401003
See Also:
http://ethiohelix.blogspot.com/2012/06/finding-tmrca-of-ethiopian-ydna.html
http://ethiohelix.blogspot.com/2012/11/extensive-doctoral-thesis-on-ethiopian.html
http://ethiohelix.blogspot.com/2013/01/tmrca-calculations-from-plaster-nry.html
http://ethiohelix.blogspot.com/2013/02/the-zhivotovsky-multiplier.html
http://ethiohelix.blogspot.com/2013/03/african-sahel-ydna.html
http://ethiohelix.blogspot.com/2013/04/source-code-for-asd-based-tmrca.html
http://ethiohelix.blogspot.com/2013/05/analyzing-ydna-j-lineages-in-ethiopian.html
http://ethiohelix.blogspot.com/2013/05/analyzing-ydna-m13-lineages-in.html
No comments:
Post a Comment