Jan 15

3:30 pm

## Compound Poisson Approximation for Occurrences of Multiple Words in DNA Sequences

### Gesine Reinert

Seminar

University of California, Los Angeles

A compound Poisson process approximation for the number of occurrences of multiple words in a DNA sequence is derived, assuming that the letters in the sequence are generated by a stationary Markov chain. Using the Chen-Stein method, a bound on the error in the approximation is given, and thus conservative confidence intervals for test statistics can be constructed. Moreover, for rare words, the error in the approximation tends to zero as the length of the sequence increases to infinity. As an example it is shown how this leads to conservative confidence intervals for the number of occurrences of stem-loop motifs. Other possible applications include multiple protein profile analysis.