University of Washington - Department of Statistics
Bagging is a machine learning technique originally designed to improve the performance of tree based learners, such as CART (Classification and Regression Trees). The basic idea, due to Leo Breiman, is simple: draw Bootstrap samples from the training sample, construct a prediction rule for each resample, and then average the rules. Experiments have shown bagging to be remarkably effective, and the question is why this is so.
Bagging can in principle be applied to any statistic, not just to CART trees. We present asymptotic results for the effects of bagging in simple situations and study variations on the basic recipe, such as choosing a different sample size for the resamples and resampling without replacement. Some of the theoretical results correctly predict the effects of bagging on CART trees, and some do not. It appears that there is no single reason for the success of bagging CART trees, but that several distinct factors contribute to the improvement.
This is joint work with Andreas Buja (University of Pennsylvania).