Ethio Helix ኢትዮ:ሒሊክስ: Converting 23andME raw data into PLINK format

Thursday, April 26, 2012

Converting 23andME raw data into PLINK format

A commenter requested that I post the script I use to convert 23andME raw data to the required PLINK format that can be then used for ADMIXTURE computations, there are several ways to do this using different types of script, but since the script I use and am most familiar with is one that is compatible with GNU OCTAVE, that is the one I will post here.

I am assuming readers will be using a linux platform, eg. Ubuntu.
In addition, the script requires for PLINK to be already installed on your machine.

Download and Install GNU OCTAVE, you can do this from Ubuntu's 'Software Centre' by simply searching for OCTAVE, it takes less than 5 minutes to download and install.
Create a new folder that you will use for converting raw data and name the folder, for instance create a folder on your desktop and rename the new folder “Convert_23andME”.
Download and then copy and paste this file into the “Convert_23andME” folder that you just created.
Download your raw data from 23andME, unzip it and copy the .txt file and paste it into the “Convert_23andME” folder you just created. You should have only 2 files in that folder now.
Start the Terminal window in Ubuntu. Change directory to the Desktop/Convert_23andME folder you created by typing in the command line of the terminal window :cd Desktop/Convert_23andME/
Start octave by typing “octave” in the command line of the terminal window
Next, type: Raw_Convert ("My_Rawdata.txt") where the string argument being passed in-between the quotations, i.e. My_Rawdata.txt, should be EXACTLY the name of the raw-data file you placed into the Convert_23andME folder in step 4.
Avoid any spaces when answering all the questions*, and press enter, allow the program to process your raw data, V2 data takes about 22 minutes on my machine, V3 will obviously be longer. The speed will depend on your machine.
When it is done, you will see 3 additional folders created within your “Convert_23andME” folder, the first folder (_conversion) will have three files with extensions .tped, .tfam and .nocall, these are the files converted by the script, where the .tped and .tfam files are the PLINK formatted transposed pedigree files of your raw data, while the .nocall is a file with the Chromosome#, assigned reference SNP IDs and position of your raw data points that were not successfully genotyped and is just for your record. The second folder (_binaryPED) will contain the files with extensions .bed, .bim, and .fam, which are created by PLINK and are the binary PED and associated files of your raw data that can be then merged with other data-sets to perform ADMIXTURE , MDS, as well as various other genome-wide analysis on. The last folder (_misc) is a folder containing miscellaneous files created by PLINK as a result of conversion from tped to binary ped, they may include files containing lists of heterozygous haploid genotypes and so forth, consult the PLINK manual for details.
Exit octave, just type 'exit'

*for the Questions the converting program in octave asks you;

“Output File Name?”

This is the name you want to give to your converted raw data file, the name you give it here will have the necessary extensions automatically appended to it so there is no need to include any extensions here, enter just the name sans the extension.

“Family ID?”

This will be the family ID PLINK identifies your raw data with,

“Individual ID?”

This will be the individual ID PLINK identifies your raw data with whereby the combination of a family and individual ID should uniquely identify a person,

“Paternal ID (Default=0)?”

You can just leave this at 0,

“Maternal ID (Default=0)?”

You can also leave this at 0,

“SEX (1=male; 2=female; other=unknown)?”

Enter 1 for male and 2 for female.
--------------------------------------------------------
Edit_Rev2: Converted Program into function, included Chromosome # and Position fields in No call list.

Edit_Rev3: Segregated No calls between Mitochondria, X and Y, included total-passed SNPs for PLINK in summary.

Edit_Rev4: Automated binary PED file creation.

6 comments:

LembaApril 26, 2012 at 3:41 PM
Hey thanks alot for this , i just used it!

This guide is great, and i can confirm this worked flawlessly. I have a quadcore Xeon, if you ever need to crunch a big dataset, let me know and i can give you SSH access or run it for you.

Took me 5.8 minutes to convert a V2 23andme file.
Time to Process : 352.626282

Also out of curiosity, when you use plink does plink ask for double extension filenames? Plink only works for me when i copy or rename the files to match. like it'll say rawdata.tped.tped file missing, so i have to rename rawdata.tped to rawdata.tped.tped and rawdata.tfam to rawdata.tped.tfam

Are you also having this?

Another note.
Gnuoctave is using only 1 core, if you have a multiple core machine it may be cut down if you can find the multi-threaded library for gnu octave, im looking for it.

23andme data conversion should have taken me 1.45 minutes if it was multi-threaded.
ReplyDelete
Replies
LembaApril 26, 2012 at 10:41 PM
Yeah, if you ever need to crunch a big project/dataset let me know my machin eis available, since im not doing 24/7 runs. I just ran admixture, and im going to try to go by the fam files to see which clusters formed, i was just playing around and i forgot to do the fst distance to see which K's are "noisier" but i did a K=15, ill let u know what happens. I have Your Afro dataset + All the amerinds. There is no euro yet, im going to see which amerinds are admixed, the euro should be proxied by north-african, ill prune all mixed natives, for Euro's im thinking of Using Basque's as the center point, hopefully they form their own cluster, might just not put other euros so they can do this and then add them as individiduals.

By the way That African dataset + Amerind's + my own raw data took about 1 hour and 40 minutes to run.
ReplyDelete
Replies

Add comment

Ethio Helix ኢትዮ:ሒሊክስ

Pages

Thursday, April 26, 2012

Converting 23andME raw data into PLINK format

6 comments:

Blog Archive

Search This Blog

Contact Form

Ethio Helix ኢትዮ:ሒሊክስ

Pages

Thursday, April 26, 2012

Converting 23andME raw data into PLINK format

6 comments:

Blog Archive

Search This Blog

Subscribe To

Contact Form