Recently there has been a fair amount of interest in combining several classification trees so as to obtain better decision rules. Techniques such as bagging, boosting, and randomized trees are particularly popular in statistics and computer science.
The best PAC-learning theoretical bounds on the classification error rate achieved by these techniques do not offer any insight into how one should combine these classifiers in order to reduce the error rate.
In this talk I will present the notion of weakly dependent classifiers, and show that when both the dependence between the classifiers is low, and the expected margins (a measure of confidence in the classifiers) are large, then low classification error rates can be achieved.
In particular, experiments with several data sets indicate that there appears to be a trade-off between weak dependence and expected margins in the sense that to compensate for low expected margins there should be low mutual dependence between the classifiers.