Mathematics – Statistics Theory
Scientific paper
2008-11-05
Mathematics
Statistics Theory
Scientific paper
The performance of cross-validation (CV) is analyzed in two contexts: (i) risk estimation and (ii) model selection in the density estimation framework. The main focus is given to one CV algorithm called leave-$p$-out (Lpo), where $p$ denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators, which makes V-fold cross-validation completely useless. From a theoretical point of view, these closed-form expressions enable to study the Lpo performances in terms of risk estimation. For instance, the optimality of leave-one-out (Loo), that is Lpo with $p=1$, is proved among CV procedures. Two model selection frameworks are also considered: estimation, as opposed to identification. Unlike risk estimation, Loo is proved to be suboptimal as a model selection procedure. In the estimation framework with finite sample size $n$, optimality is achieved for $p$ large enough (with $p/n =o(1)$) to balance overfitting. A link is also identified between the optimal $p$ and the structure of the model collection. These theoretical results are strongly supported by simulation experiments. When performing identification, model consistency is also proved for Lpo with $p/n\to 1$ as $n\to +\infty$.
Celisse Alain
No associations
LandOfFree
Optimal cross-validation in density estimation does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Optimal cross-validation in density estimation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Optimal cross-validation in density estimation will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-374280