Mathematics – Statistics Theory
Scientific paper
2012-02-10
Mathematics
Statistics Theory
Scientific paper
In this paper we consider stochastic multiarmed bandit problems. Recently a policy, DMED, is proposed and proved to achieve the asymptotic bound for the model that each reward distribution is supported in a known bounded interval, e.g. [0,1]. However, the derived regret bound is described in an asymptotic form and the performance in finite time has been unknown. We inspect this policy and derive a finite-time regret bound by refining large deviation probabilities to a simple finite form. Further, this observation reveals that the assumption on the lower-boundedness of the support is not essential and can be replaced with a weaker one, the existence of the moment generating function.
Honda Junya
Takemura Akimichi
No associations
LandOfFree
Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-275935