Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

Mathematics – Optimization and Control

Scientific paper

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

: 2011-06-30
: arxiv.org/abs/1106.6104v2
: Mathematics
: Optimization and Control

: 18 pages, 2 figures
: Scientific paper
: In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward distributions. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. The essence of the problem is the tradeoff between exploration and exploitation: playing a less explored arm to learn its reward statistics for future benefit or playing the arm with the largest sample mean at the current time. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. Under this approach, the key issue is the cardinality of the exploration sequence, which balances the accuracy and the cost of learning and defines a lower bound on the regret. It is show that when the moment-generating functions of the arm reward distributions are properly bounded around zero, the optimal logarithmic order of the regret can be achieved by DSEE. The condition on the reward distributions can be gradually relaxed at a cost of a higher (nevertheless, still sublinear) regret order: for any positive integer p, O(T^{1/p}) regret can be achieved by DSEE when the moments of the reward distributions only exist up to the pth order. The proposed DSEE approach complements existing work on MAB by providing corresponding results under a set of relaxed conditions on the reward distributions. Furthermore, with a clearly defined tunable parameter-the cardinality of the exploration sequence, the DSEE approach is more easily extendable to variations of MAB, as demonstrated by its generalization to a decentralized MAB with multiple players under corrupted reward observations. Potential applications include dynamic spectrum access, multi-agent systems, Internet advertising and Web search.

Affiliated with

Liu Keqin

Mathematics – Rings and Algebras

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Zhao Qing

Mathematics – Optimization and Control

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFWR-SCP-O-34462

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure