Tuesday, October 8, 2013

TMRCA calculator for Python

I have converted the TMRCA calculator to run from only on Octave to Python as well, see here for the Octave version.
It is specifically made for Python 2.7, and have not had a chance to test it on other versions. No more libraries are required to run the script other than the standard libraries that come with 2.7. Some of the advantages of converting to Python are: less steps to run the program, easier for (future) web app deployment and more user access to Python than Octave.

The Zip file can be dowloaded here: https://dl.dropboxusercontent.com/u/42082352/TMRCA.zip
--------------------------------------------------------------------------------------------------------
TMRCA Calculator Instructions - for python 2.7

To check if the TMRCA program is correctly working on your system, first run it with the dataset
provided here before trying different datasets, to do so:

(1) Make sure you have python 2.7 loaded on your system (either Windows or Linux will work) and start running the interpreter.
(2) In the interpreter, change your working directory to the directory where you saved the unzipped folder by using:
(i) import os 
(ii) os.chdir('~PATH/TMRCA/')
-Where ~PATH is the full path where the TMRCA folder is placed on your system.
If you are unsure of your current working directory, type the command: os.getcwd()
(3) import the tmrca module by typing: import tmrca  
(4) Execute Script by typing: tmrca.Analysis('EM35_Example.csv','all')
(5) If this produces results with no errors in the interpreter, then the program is correctly installed and you can proceed to reading and analysing different datasets.

Reading and analysing new Data

After correctly executing the above steps, read and analyse new data by using the following steps:
(1)Examine the example STR data file in the "TMRCA/" folder entitled "EM35_Example.csv"
(2)Any STR data file to be analysed should first be made in the same format as the "EM35_Example.csv" file , specifically:
(a) DYS names in the first row should have the exact same nomenclature (the orders can be different however).
(b) Each row (except the first) should represent one sample.
(c) Each coloumn (except the first) should represent repeats for one marker/DYS#.
(d) The first column should represent sample identifiers, ex. Kit#, sample ID,...
(e) The cell found in the first row and first column should have the Dataset's name, this will be the same name used throughout the analysis.
(f) No cells shall contain null values and avoid having cells that contain characters which have spaces in between them.
(g) The file MUST be a *.csv file with commas used for field delimiters
(3) Place the *.csv file directly in the "TMRCA/" folder (i.e. in your working directory)
(4) Start the interpreter, change the working directory to '~PATH/TMRCA/', as per the instructions above and import tmrca.
(5) If you want to analyse a specific set of markers from your dataset go to step 6, otherwise go to step 7
(6) Go to the file "/TMRCA/Markerlist/49markerlist.txt", and pick the markers you want to use for analysis from there. Save your chosen
markers into a new *.txt file and into the same folder as "/TMRCA/Markerlist/". Take a look at  any of the other marker list text files in
the folder for an example of how a marker list should look. Note that all marker list files need to be *.txt
(7) If you are specifying a set of markers to use for the analysis, for example "8_Chiaronimarkerlist.txt", then run the program
by typing: tmrca.Analysis('EM35_Example.csv','8_Chiaronimarkerlist.txt'),otherwise, just type: tmrca.Analysis('EM35_Example.csv','all').

No comments:

Post a Comment