Stat/Biostat 550 (2007): Lab 3: PHASE and MORGAN/kin
Due: Thursday Nov 8, 2007
This week's lab will cover two programs.
Note that some write-up is requested for each program.
The program PHASE is due to Matthew Stephens, and produces
estimates of haplotypes of individuals given their genotypes at mutiple loci.
It uses a model for similarities among haplotypes in a population, and samples
phasings of the genotypic data under this model.
The second program is a MORGAN program,
kin, which computes the kinship coefficient between specified pairs
of individuals, and the inbreeding coefficient of specified individuals.
Remember that if you did not discover how to put the
source ~statgen/.statgen.cshrc
into your own .cshrc (or equiv.)
file, you will need to give this command each
time you log on, to access the statgen programs.
A: PHASE Assignment
A1) Try applying Clark's algorithm (by hand) to estimate the
haplotypes of each individual given the following genotype data on 5
SNPs in 6 individuals. I have reformatted the data, so each pair of
digits is the genotype at a SNP locus, and each row is an individual.
Does the algorithm give a unique solution?
| 10, 10, 10, 10, 10
|
| 00, 00, 00, 11, 10
|
| 00, 00, 00, 10, 00
|
| 10, 10, 10, 11, 00
|
| 11, 11, 11, 11, 00
|
| 00, 00, 00, 10, 11
|
A2) Use the PHASE program
to estimate the haplotype
frequencies and haplotypes for each individual in the problem above.
The data set in the file phase.inp is the same data as above,
but in Matthew Stephens' format!
Compare
your estimates with the estimates you get from Clark's algorithm.
To run PHASE:
1. Go to the small PHASE example input data file
phase.inp.
Copy the PHASE example file
phase.inp, into your b550 working directory on abacus
-- or to wherever you want
to run PHASE.
2. run PHASE by typing, for example :
% PHASE phase.inp phase.out
The program prints some information to the screen and also produces
several output files with names starting phase.out_... .
3. examine the output in the file
phase.out_pairs (which lists the most probable haplotype pairs
for each individual, along with their estimated probabilities), and
the file phase.out_freqs which gives estimates of
the population frequencies of each haplotype.
Write a couple of brief paragraphs explaining the output, and comparing
the results with Clark's algorithm.
Here
are the full
instructions for the PHASE software, if you need
them, or are interested in understanding the input format etc.
However, if you have problems, the first thing to do is to email me!
(The PHASE instructions are still here as of Oct 23, 2007, but
may go now that Matthew Stephens has left UW--
but they will still be somewhere on the web -- try Google if interested.
Also, if you are into this area, you may wish to try the more modern
FastPhase, which is available through
Mathew Stephens page at Chicago.)
B: MORGAN/kin assignment
B1 Practising Using Kin:
Again, we will use the example in the
MORGAN Tutorial and downloaded examples for
practice. You can find out more about kin in
Chapter 4 of the
online tutorial.
In the MORGAN_Examples directory you downloaded for Lab-1,
there is a subdirectory IBD. Go into ("cd") this
subdirectory. There is a parameter file
jv_rep_kin.par which specifies the pedigree file
jv_rep.ped.
Take a look at jv_rep.ped.
As the name suggests, this pedigree is just 2 copies of our usual
JV pedigree.
The first two parameter statements specify that there are 30
individuals in the
pedigree, each of whom has the usual trio of names, followed by 2 integers.
The other two parameter statements are self-explanatory, and are really
just there for information, since these are the default options.
Now take a look at jv_rep_kin.par.
The first line tells the program where to find the pedigree
file.
The next line requests the program to compute kinship coefficients
between two pairs of individuals, the final individual (531) and her dad,
and between her two parents. The next line asks for a couple of inbreeding
coefficients on the second copy ("component") of the JV pedigree.
You can specify as many pairs (for kinship) and individuals
(for inbreeding) as you want -- the program should figure it out.
However, if there is more than one component pedigree in the data file,
you do need to tell it which component pedigree the individuals are
in. (This should not be a problem for your own single-component example
pedigrees.)
Finally the program asks for a couple of
two-locus inbreeding coefficients, which may
or may not be talked about in class. It is just the probability that the
maternal and paternal haplotypes of an
individual are IBD at both of two linked loci. This probability will
depend on the recombination rate between the loci, so the final line
specifies the recombination values at which this two-locus inbreeding
is to be computed. (This paragraph is just here for info: you can ignore
this bit if you wish.)
To run the program kin on the example file, type:
% kin jv_rep_kin.par
or, if you would like to send your output to a file, such as
kin.out, type
% kin jv_rep_kin.par > kin.out
or, if you decided to call the pedigree file something different
such as my-pedfile
% kin kin.par ped my-pedfile
(The ped key on the command line overrides whatever is in the
parameter file.)
Look at your output, or output file. It should be self-explanatory.
Practice modifying the jv_rep_kin.par
file to calculate an inbreeding coefficient
or kinship coefficient for another individual or pair of individuals.
Apparently, kin thinks it is an error to try to compute a kinship
of an individual with itself: I need to fix this, but it is not yet fixed!!
.
B2 Your MORGAN/kin Assignment
Create a parameter file to run kin on a copy of your own pedigree file
that you made for pedcheck in Lab 1.
(It is simplest to
leave the pedigree size and pedigree input record ...
statements in your pedigree file from when you ran pedcheck on it,
as in the copy here of jv_rep.ped, and to put the other statements
into your own version of a parameter file
kin.par, although in fact MORGAN does not
care which statements are in which file.)
Also note that for your single-component pedigrees you just leave
out the component 1 bit of the statement. You can ask for any
number of pairs for kinship in one statement (as the two pairs in the
example), and any number of individuals for inbreeding (as the two in the
example). You should have (for each component if you have more than 1),
just one kinship request statement and just one inbreeding request statement
in your parameter file.
Your output should include the following:
(i) Calculations of the kinship coefficients between at least two pairs
of individuals in your pedigree. One of these pairs should be a pair of
bilateral relatives other than siblings.
(ii) Calculations of the inbreeding coefficients for at least two
individuals in your pedigree, at least one of whom is inbred.
Turn in a sheet of paper with your results, with brief explanation of the
relationships between the individuals you have chosen (for kinship) or
between their parents (for inbreeding).