1. This very small example is unrealistic, but shows the problem that
can arise in trying to use the EM algorithm to estimate haplotype frequencies.
We have a sample of just one individual who is heterozygous at both of
two loci: A1A2, B1B2.
Denote the frequencies of the four haplotypes
A1B1,
A1B2,
A2B1, and
A2B2,
by
q11
q12
q21 and
q22.
Show that there are two maximum likelihood estimates (of equal likelihood):
q11 = q22 = 1/2,
q12 = q21 = 0 and
q12 = q21 = 1/2,
q11 = q22 = 0.
What will the EM algorithm do?
2. A pair of brothers, A and B, marry a pair of sisters, A* and B*.
A and A* have a son D, and B and B* have a daughter E. Thus D and E
are double first cousins.
We consider identity-by-descent (IBD) at a single locus.
You may assume the kinship coefficient between a pair of sibs is 1/4.
(a) What is the probability that D and E receive IBD genes from their fathers?
(b) What is the probability that D and E share neither their maternal genes
nor their paternal genes?
(c) What is the kinship coefficient between D and E ?
(d) D and E now produce a child H.
What is the inbreeding coefficient of D?
What is the inbreeding coefficient of H?
Ans: 1/4, 9/16, 1/8 for (a), (b) and (c).
3. (based on Lange Ch 5 #3)
A brother-sister full-sib pair, A and B,
(mice, not humans) produce two offspring,
D and E.
(The kinship coefficient of full sibs is 1/4.)
(a) What is the inbreeding coefficient of D ?
(b) What is the kinship coefficient between D and B ?
(c) What is the kinship coefficient between D and E ?
(d) What is the probability D and E carry 4 IBD genes at a locus ?
(e) What is the probability D and E each carries two IBD genes at a locus,
but these are different genes?
(For (d) and (e), it is easiest to just draw a picture and count the ways.)
Answers (?): 1/4, 3/8, 3/8, 1/16, 1/32
4. The data for this question come
from a study by Dr. Arno Motulsky and coworkers, and
are published in Thompson et al. (1988; Am.J.Hum.Genet, 42, 113-124).
There were three population samples (all from around Seattle),
(Caucasian, African American, and
Japanese American), and three tightly linked diallelic loci,
designated M, P and S.
In the Caucasian sample of 205 individuals typed at the
P and M loci, the counts of the two-locus phenotypes were
143, 35, 3, 17, 5, 0, 2, 0, and 0, respectively, for the nine types
P1P1,M1M1;
P1P1,M1M2;
P1P1,M2M2;
P1P2,M1M1;
P1P2,M1M2;
P1P2,M2M2;
P2P2,M1M1;
P2P2,M1M2;
P2P2,M2M2.
Use the EM algorithm to estimate the 4 haplotype frequencies.