May 4

3:30 pm

## Analysis of Sequence-Tagged-Connector Strategies for DNA Sequencing

### Andy Siegel

Seminar

University of Washington - Management Science & Finance

We derive properties of Sequence-Tagged-Connector (STC) sequencing strategies in the presence of false matches in order to understand the consequences of clone-library decisions on the incidence of problem clones and the cost of the sequencing project within a mathematical model of a random target with homologous repeats and imperfect sequencing technology. When a minimum-overlap extension method is used, we find the expected number of problem clones for which either (a) there is no identifiable overlapping STC to extend the sequence in a particular direction or (b) the identified STC with minimum overlap actually comes from a nonoverlapping clone, either due to random false matches or to repeat-family homology. Based on the expected minimum overlap, we estimate the number of clones to be entirely sequenced and then, using cost estimates, identify the decision rule to minimize overall sequencing cost. For a target of 3 gigabases containing 838 megabases of repeats with 85-90% similarity, we find that with 15 times coverage by 150,000-base clones that only 9.4 problem clones are expected, and we estimate total sequencing cost at $1.5 billion.