Consider the regression model Y=g0(X)+E, where E is the error term, and g0:R^k -> R is the unknown regression function to be estimated from independent observations of (X,Y). Furthermore we have a countable collection of models (classes of candidate regression functions of finite VC dimension) of growing complexity. The larger the model, the better the approximation error, but the worse the estimation error. In order to balance both errors, we propose to estimate g0 by means of penalised least squares, where the penalty is proportional to the VC-dimension of the model. We shall obtain a finite sample upper bound for the squared L2 distance between the estimate and g0 in terms of a purely empirical quantity. I shall stress the differences with the recent work by Barron, Birge and Massart (Probability Theory and Related Fields, 1999).