Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

Statistics – Machine Learning

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistence, and exhibit a generalisation of the logarithmic bound. We also show the non existence of logarithmic bound in the general case of Hannan consistency. To get these results, we study variants of popular Upper Confidence Bounds (ucb) policies. As a by-product, we prove that it is impossible to design an adaptive policy that would select the best of two algorithms by taking advantage of the properties of the environment.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-137001

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.