Computer Science – Artificial Intelligence
Scientific paper
2011-08-18
Computer Science
Artificial Intelligence
8 pages, 6 figures
Scientific paper
UCT, a state-of-the art algorithm for Monte Carlo tree sampling (MCTS), is based on UCB, a sampling policy for the Multi-armed Bandit Problem (MAB) that minimizes the accumulated regret. However, MCTS differs from MAB in that only the final choice, rather than all arm pulls, brings a reward, that is, the simple regret, as opposite to the cumulative regret, must be minimized. This ongoing work aims at applying meta-reasoning techniques to MCTS, which is non-trivial. We begin by introducing policies for multi-armed bandits with lower simple regret than UCB, and an algorithm for MCTS which combines cumulative and simple regret minimization and outperforms UCT. We also develop a sampling scheme loosely based on a myopic version of perfect value of information. Finite-time and asymptotic analysis of the policies is provided, and the algorithms are compared empirically.
Shimony Solomon Eyal
Tolpin David
No associations
LandOfFree
Doing Better Than UCT: Rational Monte Carlo Sampling in Trees does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Doing Better Than UCT: Rational Monte Carlo Sampling in Trees, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Doing Better Than UCT: Rational Monte Carlo Sampling in Trees will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-181113