The raw data for the full 71 million Wellcome Trust SNPs can be found
at the Mouse Genomes Project at Wellcome Trust.
This is the link to download the imputed
genotypes at 64 million (64,623,363) SNPs, which correspond to the homozygous high-confidence SNPs.
We also are hosting a set of 68,163,509 SNPs corresponding to both high confdence homozygous SNPs and
low confidence or missing SNPs.
Missing SNPs were imputed using EMINIM.
Imputed SNPs in 78 mouse strains
Each file is compressed with gzip. Once the file is uncompressed, each file contains a header file describing each column. The genotypes are represented in the following format.- All the genotyped SNPs (by Wellcome Trust or mouse HapMap) are represented with capital letters (A,C,G,T)
- In the posterior probability file, the imputed alleles of ungenotyped SNPs are represented as the posterior probability of allele 1. The probability of allele 2 is simply 1 - Pr(allele 1).
- In the files containing imputed calls, the genotypes with maximium probability are chosen. The imputed genotype calls are represented in small letters (a,c,g,t) to be distinguished from the actual genotypes, encoded with captial letters (A,C,G,T).
Original Wellcome Trust genotypes with missing SNPs imputed
For this set, missing genotypes in each of the sequenced 17 strains were imputed using a leave one out approach. For each strain, we imputed missing genotypes by removing the strain from the full set and using the remaining strains as a reference population. We were on average able to confidently call 25% of the previously missing genotypes.Each file is compressed with gzip. Once the file is uncompressed, each file contains a header file describing each column. The genotypes are represented in the following format.
- In the posterior probability file, each strain has two columns. The first column represents the probability of calling the genotype as missing. The second column represents the probability of calling the genotype as the reference allele. The probability of calling the genotype as the alternative allele can then be computed as 1 minus the sum of the first and second column for each strain.
- In the files containing imputed calls, the missing genotypes were called if the maximum probability was greater than 95%, otherwise the genotype was left as missing.