A large number of experiments in clinical trials, biology, biochemistry, etc. are, out of necessity, conducted in two stages. A first stage experiment (a pilot study) is often used to gain information about feasibility of the experiment or to provide preliminary data for grant applications. We study the theoretical statistical implications of using the first stage data (1) to design the second stage experiment and (2) to combine first stage data with the second stage data for data analysis.
To illuminate the issues, we consider an experiment in which pilot study data is used to estimate an optimal design for the second stage experiment under a non-linear regression model with normal errors. We show how the dependency between data in the different stages affects the distribution of parameter estimates. It is common for data analysis methods to be based on the assumption that sample sizes in both stages go to infinity, in which case maximum likelihood estimates are normally distributed, as is the case of independent observations. But when the first stage sample size is fixed and finite, maximum likelihood estimates are found to have a mixed normal distribution.