Monday, January 27, 2014

Y TMRCA Calculator as a Web App

The Y DNA (STR) TMRCA calculator can now be accessed as a web application with full functionality here:

http://ehelix.pythonanywhere.com/

It is also embedded in this blog in a new page (above)

UPDATE (02/11/2014)

Another series of updates for the calculator:

  • User now able to utilize the previously idle first column in the csv file to group haplotypes together and thus compute the TMRCA for a specified group (see example below)
  • The application now also accepts Locus names in NIST format as well.
  • It also now automatically deletes any haplotype with a non-integer value given for any locus in the *.csv file. (instead of producing an error for that scenario)



To demonstrate the filtration utility, consider the YDNA E-V13 portion of the Bulgarian dataset from “Y-Chromosome Diversity in Modern Bulgarians: New Clues about Their Ancestry”, the *.csv file for this dataset can be downloaded from the link below as well : 
http://ehelix.pythonanywhere.com/init/default/Example_Files.

When the file is uploaded, the application gives the option to analyze any subset of  haplotypes with N>1 that is available, i.e in this case: Bulgaria/Central, Bulgaria/East and Bulgaria/West, in addition to all of the Haplotypes.

Picking each of the above subsets from the “Choose a filter” listbox and running them separately produces the following results:

****************************************************************************************************
Active Y-str file: Bulgaria_EV13.csv
Active Markers file: Full_marker_list.txt
Dataset: Bulgaria_EV13
Sample size: 49
****************************************************************************************************
*Marker Details*
8 requested markers included in analysis (41 excluded):-

Markers not found in the Dataset: ['458', '590', '578', '594', '450', '572', '557', '570', '454', '455', '456', '388', '490', '492', '641', '406s1', '472', '520', '426', '568', '449', '448', '438', '460', '442', '447', '565', '617', '436', '446', '487', '444', '481', '537', '640', '534', '576', '531', '511', '437', 'gatah4']
Markers used in Analysis: ['19', '393', '392', '391', '390', '439', '389-1', '389-2']
****************************************************************************************************
*Coalescent Details*
Ballantyne--Generations(Median)--74.17 Generations(Modal)--74.17

Burgarella_Navascues--Generations(Median)--89.57 Generations(Modal)--89.57

Chandler--Generations(Median)--101.78 Generations(Modal)--101.78

Stafford--Generations(Median)--87.07 Generations(Modal)--87.07

Zhivotovsky--Generations(Median)--317.95 Generations(Modal)--317.95

****************************************************************************************************
*Pedigree/Familial Rates Summary*
Years/Generation: 28 - 33

TMRCA Range: 2076 - 3358

Mean TMRCA: 2688

Median TMRCA: 2678

****************************************************************************************************
Active Y-str file: Bulgaria_EV13.csv
Active Markers file: Full_marker_list.txt
Dataset: Bulgaria_EV13, Filter = Bulgaria/Central
Sample size: 17
****************************************************************************************************
*Marker Details*
8 requested markers included in analysis (41 excluded):-

Markers not found in the Dataset: ['458', '590', '578', '594', '450', '572', '557', '570', '454', '455', '456', '388', '490', '492', '641', '406s1', '472', '520', '426', '568', '449', '448', '438', '460', '442', '447', '565', '617', '436', '446', '487', '444', '481', '537', '640', '534', '576', '531', '511', '437', 'gatah4']
Markers used in Analysis: ['19', '393', '392', '391', '390', '439', '389-1', '389-2']
****************************************************************************************************
*Coalescent Details*
Ballantyne--Generations(Median)--50.30 Generations(Modal)--50.30

Burgarella_Navascues--Generations(Median)--61.98 Generations(Modal)--61.98

Chandler--Generations(Median)--64.37 Generations(Modal)--64.37

Stafford--Generations(Median)--53.36 Generations(Modal)--53.36

Zhivotovsky--Generations(Median)--245.10 Generations(Modal)--245.10

****************************************************************************************************
*Pedigree/Familial Rates Summary*
Years/Generation: 28 - 33

TMRCA Range: 1408 - 2124

Mean TMRCA: 1753

Median TMRCA: 1748

****************************************************************************************************
Active Y-str file: Bulgaria_EV13.csv
Active Markers file: Full_marker_list.txt
Dataset: Bulgaria_EV13, Filter = Bulgaria/East
Sample size: 16
****************************************************************************************************
*Marker Details*
8 requested markers included in analysis (41 excluded):-

Markers not found in the Dataset: ['458', '590', '578', '594', '450', '572', '557', '570', '454', '455', '456', '388', '490', '492', '641', '406s1', '472', '520', '426', '568', '449', '448', '438', '460', '442', '447', '565', '617', '436', '446', '487', '444', '481', '537', '640', '534', '576', '531', '511', '437', 'gatah4']
Markers used in Analysis: ['19', '393', '392', '391', '390', '439', '389-1', '389-2']
****************************************************************************************************
*Coalescent Details*
Ballantyne--Generations(Median)--85.28 Generations(Modal)--85.28

Burgarella_Navascues--Generations(Median)--100.69 Generations(Modal)--100.69

Chandler--Generations(Median)--116.98 Generations(Modal)--116.98

Stafford--Generations(Median)--102.94 Generations(Modal)--102.94

Zhivotovsky--Generations(Median)--339.67 Generations(Modal)--339.67

****************************************************************************************************
*Pedigree/Familial Rates Summary*
Years/Generation: 28 - 33

TMRCA Range: 2387 - 3860

Mean TMRCA: 3094

Median TMRCA: 3078

****************************************************************************************************
Active Y-str file: Bulgaria_EV13.csv
Active Markers file: Full_marker_list.txt
Dataset: Bulgaria_EV13, Filter = Bulgaria/West
Sample size: 16
****************************************************************************************************
*Marker Details*
8 requested markers included in analysis (41 excluded):-

Markers not found in the Dataset: ['458', '590', '578', '594', '450', '572', '557', '570', '454', '455', '456', '388', '490', '492', '641', '406s1', '472', '520', '426', '568', '449', '448', '438', '460', '442', '447', '565', '617', '436', '446', '487', '444', '481', '537', '640', '534', '576', '531', '511', '437', 'gatah4']
Markers used in Analysis: ['19', '393', '392', '391', '390', '439', '389-1', '389-2']
****************************************************************************************************
*Coalescent Details*
Ballantyne--Generations(Median)--88.42 Generations(Modal)--88.42

Burgarella_Navascues--Generations(Median)--107.76 Generations(Modal)--107.76

Chandler--Generations(Median)--126.34 Generations(Modal)--126.34

Stafford--Generations(Median)--107.00 Generations(Modal)--107.00

Zhivotovsky--Generations(Median)--373.64 Generations(Modal)--373.64

****************************************************************************************************
*Pedigree/Familial Rates Summary*
Years/Generation: 28 - 33

TMRCA Range: 2475 - 4169

Mean TMRCA: 3275

Median TMRCA: 3274

Notice that if a filter is chosen, the applied filter's name will be appended to the Dataset field of the results with “Filter =”, if all haplotypes are requested on the other hand, the Dataset field will just contain the Dataset's name as specified by the user.

To double check if the ASD computation has been carried out correctly for the central TMRCA estimates, we can cross-check the result of the publication itself for the 3 regions (Table S7 in the supporting information), against the Zhivotvsky results computed by the app. (highlighted above) multiplied by 25 (for 25years/generation according to Zhivotovsky).

                                    (Publication)           app. Zhivotovsky results
Bulgaria/Central        6,100YA                   6,127.5 YA
Bulgaria/East             8,400YA 8,491.75 YA
Bulgaria/West          9,300YA 9,341.00 YA

Close Enough!




UPDATE (03/16/2014)
 
It is now possible to copy and paste FTDNA-type haplotype repeats into the app, instead of only being able to upload csv files.

UPDATE (03/21/2014)
Added another 'mode' of analysis, the 'compare mode':

Compare Mode - If there is more than one unique subset in the dataset, then TMRCA computations will be carried out on each unique subset simultaneously, summarized results will then be printed for each subset in a tabular format. A "Sample Size Threshold" can be assigned in this mode if the user wants to require a minimum sample size from each subset to be analyzed , if nothing is assigned in this field, then the application will use N = 2 as the minimum sample size. Note: Slightly longer computation times will be required relative to analysis carried out in single mode depending on sample size and the total number of markers.

The Compare mode results area below the submission form is divided into two parts:
  1. The first part establishes baseline information for the entire dataset including the active STR and marker files, the sample size, DYS#'s
    used and the mean TMRCAs.

  2. The second part tabulates the results for each unique subset. The first column of the table simply shows the name of the subset as
    assigned in the filter column of the Y-STR file. The second column is for the sample size of the subset. The third column shows the
    ratio of the number of haplotypes in the subset, relative to the total number of haplotypes in the entire dataset. The fourth column,
    Z-TMRCA, shows the mean TMRCA in generations using the Zhivotovsky rates. The last column, P-TMRCA, shows the mean TMRCA
    in generations using all the available pedigree rates. All of the columns are sortable in ascending or descending order.

  3. A link is given in the event that the user needs to open the results of the compare mode analysis in a separate tab. This maybe useful
    if one wants to drill into a detailed analysis of any of the subsets found in the table using the single mode analysis, with out closing the
    the results of the compare mode analysis for the active dataset.





To Demonstrate, with a total of N=1391 of 67 marker E1b1b haplotypes copied from the FTDNA public pages, and using the 'Country' column as the filter.
For a threshold of N = 25 and the full marker list, sorted for descending  Z-TMRCA:



For a threshold of N = 25 and the Zhivotovsky marker list, sorted for descending  Z-TMRCA:
UPDATE (03/25/2014)
Instead of using the 0.00069 rate for all the markers in the
calculator that were not found in the Zhivotovsky publication, I have
normalized the rates for the markers using an average of all the
pedigree rates normalized with the effective rate's ratio , specific
procedure I used to do this can be seen in the spreadsheet below.


https://docs.google.com/spreadsheets/d/1D6FU4fpB6vwAnle2-oiEvCV7_50QQhdhk6irSSVQtDE/edit#gid=467401003

See Also:
http://ethiohelix.blogspot.com/2012/06/finding-tmrca-of-ethiopian-ydna.html
http://ethiohelix.blogspot.com/2012/11/extensive-doctoral-thesis-on-ethiopian.html
http://ethiohelix.blogspot.com/2013/01/tmrca-calculations-from-plaster-nry.html
http://ethiohelix.blogspot.com/2013/02/the-zhivotovsky-multiplier.html
http://ethiohelix.blogspot.com/2013/03/african-sahel-ydna.html
http://ethiohelix.blogspot.com/2013/04/source-code-for-asd-based-tmrca.html
http://ethiohelix.blogspot.com/2013/05/analyzing-ydna-j-lineages-in-ethiopian.html
http://ethiohelix.blogspot.com/2013/05/analyzing-ydna-m13-lineages-in.html

No comments:

Post a Comment