Jan 29

3:30 pm

## Missing Data in Morphometrics

### Fred Bookstein

Seminar

University of Michigan

Morphometric data sets have not only the usual parameter structures (mean shape, sample covariance) but also other geometric functions of the mean form that can structure prior knowledge. When information from data is absent or weak, these auxiliary formalisms can supply reasonable "expectations" in a context similar to the classic EM alternating algorithm. On odd-numbered steps, population parameters are estimated by least-squares or ML; on even-numbered steps, individual missing data are estimated. This casewise part can itself be ML, in an adaptation of the usual EM algorithm to the context of shape, or it may use any of the other available geometric structures instead. One such approach begins with the ordinary Procrustes mean of the complete cases. Next, individual missing landmarks are estimated by thin-plate spline deformation of the mean using the non-missing landmarks case by case; the mean shape is then recomputed by the ordinary Procrustes formulation (ML on the offset spherical normal model), and so on to convergence. Landmarks that come in left-right pairs generate a preferred basis for Procrustes shape space, and in this context there is a special version of either of these techniques that can apply to sample sizes as low as 1.

In a situation with few analogues for more traditional data types, morphometric data need not be "entirely missing." For instance, just as a shape is, formally, an equivalence class of pointsets under the action of the similarity group, so an "outline shape" can be treated as an equivalence class under the action of the Cartesian product of similarity group and relabeling group. An alternating algorithm can produce linearized shape coordinates in this context just as it does in the usual (landmark point) case: the Procrustes analysis of semilandmarks. This approach drives the most important application yet of modern morphometrics, to the work (joint with Paul Sampson) on the fetal-alcohol-affected brain right here at U-W. Another hybrid equivalencing approach applies in paleontological reconstruction of specimens that have been damaged in the course of fossilization.

My presentation will review these algorithms, show worked examples of each, and speculate about how useful it is to treat _any_ morphometric data set as if its information structures were mostly "missing."