Multivariate multinomial mixtures: a data-driven penalized criterion for variable selection and clustering

Mathematics – Statistics Theory

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

We consider the problem of estimating the number of components and the relevant variables in a multivariate multinomial mixture. This kind of models arise in particular when dealing with multilocus genotypic data. A new penalized maximum likelihood criterion is proposed, and a non-asymptotic oracle inequality is obtained. Further, under weak assumptions on the true probability underlying the observations, the selected model is asymptotically consistent. On a practical aspect, the shape of our proposed penalty function is defined up to a multiplicative parameter which is calibrated thanks to the slope heuristics, in an automatic data-driven procedure. Using simulated data, we found that this procedure improves the performances of the selection procedure with respect to classical criteria such as BIC and AIC. The new criterion gives an answer to the question "Which criterion for which sample size?".

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Multivariate multinomial mixtures: a data-driven penalized criterion for variable selection and clustering does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Multivariate multinomial mixtures: a data-driven penalized criterion for variable selection and clustering, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multivariate multinomial mixtures: a data-driven penalized criterion for variable selection and clustering will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-535109

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.