There are 116213 rows and 150 columns in this tab-delimited text file.
Column 1 (starting in Row 2) is SNP name, Column 2 is rsID (like rs*******), Column 3 is chromosome, Column 4 is position; Column 5 to 100 is case group (96 cases), Column 101 to 150 is control group (50 controls). Columns 5 to 100, and Column 101 to 150, were sorted by their names.
The first row contains the name of each column.
Row 2 to 9209 is chromosome 1, Row 9210 to 19561 is chromosome 2, Row 19562 to 27374 is chromosome 3, Row 27375 to 35973 is chromosome 4, Row 35974 to 44344 is chromosome 5, Row 44345 to 52434 is chromosome 6, Row 52435 to 59503 is chromosome 7, Row 59504 to 66478 is chromosome 8, Row 66479 to 71280 is chromosome 9, Row 71281 to 76979 is chromosome 10, Row 76980 to 82346 is chromosome 11, Row 82347 to 87609 is chromosome 12, Row 87610 to 92850 is chromosome 13, Row 92851 to 96865 is chromosome 14, Row 96866 to 99897 is chromosome 15, Row 99898 to 102277 is chromosome 16, Row 102278 to 104240 is chromosome 17, Row 104241 to 107810 is chromosome 18, Row 107811 to 108498 is chromosome 19, Row 108499 to 110588 is chromosome 20, Row 110589 to 112501 is chromosome 21, Row 112502 to 113262 is chromosome 22, Row 113263 to 115599 is X chromosome. The rows of the same chromosome were sorted by their positions (from small to large).
In the columns of case and control group, 1 denotes the genotype AA, 2 denotes the genotype AB, 3 denotes the genotype BB, and 0 for missing value or NoCall. In the column of position, 0 denotes missing value. In the other columns, NA denotes missing value.
Please pay attention to the chromosome column. There are two labels (“1” and “1_random”) for chromosome 1. You also can find “9_random”, “X_random” or other similar format in the dataset. Fortunately, there are only 40 “*_random” in the data, and some rsID of them are still available.
If you don't need the first 4 columns, the dataset can be found here.
Gender information of each person can be found at "newpheno.txt".
^_^