STAT/BIOSTAT 550, 2007: Homework 4: Due October 25


1. This very small example is unrealistic, but shows the problem that can arise in trying to use the EM algorithm to estimate haplotype frequencies. We have a sample of just one individual who is heterozygous at both of two loci: A1A2, B1B2. Denote the frequencies of the four haplotypes A1B1, A1B2, A2B1, and A2B2, by q11 q12 q21 and q22.
Show that there are two maximum likelihood estimates (of equal likelihood):
q11 = q22 = 1/2, q12 = q21 = 0 and q12 = q21 = 1/2, q11 = q22 = 0.
What will the EM algorithm do?

2. A pair of brothers, A and B, marry a pair of sisters, A* and B*. A and A* have a son D, and B and B* have a daughter E. Thus D and E are double first cousins.
We consider identity-by-descent (IBD) at a single locus.
You may assume the kinship coefficient between a pair of sibs is 1/4.
(a) What is the probability that D and E receive IBD genes from their fathers?
(b) What is the probability that D and E share neither their maternal genes nor their paternal genes?
(c) What is the kinship coefficient between D and E ?
(d) D and E now produce a child H. What is the inbreeding coefficient of D? What is the inbreeding coefficient of H?

Ans: 1/4, 9/16, 1/8 for (a), (b) and (c).

3. (based on Lange Ch 5 #3)
A brother-sister full-sib pair, A and B, (mice, not humans) produce two offspring, D and E.
(The kinship coefficient of full sibs is 1/4.)
(a) What is the inbreeding coefficient of D ?
(b) What is the kinship coefficient between D and B ?
(c) What is the kinship coefficient between D and E ?
(d) What is the probability D and E carry 4 IBD genes at a locus ?
(e) What is the probability D and E each carries two IBD genes at a locus, but these are different genes?
(For (d) and (e), it is easiest to just draw a picture and count the ways.)

Answers (?): 1/4, 3/8, 3/8, 1/16, 1/32

4. The data for this question come from a study by Dr. Arno Motulsky and coworkers, and are published in Thompson et al. (1988; Am.J.Hum.Genet, 42, 113-124).
There were three population samples (all from around Seattle), (Caucasian, African American, and Japanese American), and three tightly linked diallelic loci, designated M, P and S.

In the Caucasian sample of 205 individuals typed at the P and M loci, the counts of the two-locus phenotypes were
143, 35, 3, 17, 5, 0, 2, 0, and 0, respectively, for the nine types
P1P1,M1M1; P1P1,M1M2; P1P1,M2M2; P1P2,M1M1; P1P2,M1M2; P1P2,M2M2; P2P2,M1M1; P2P2,M1M2; P2P2,M2M2. Use the EM algorithm to estimate the 4 haplotype frequencies.