Doing Better Than UCT: Rational Monte Carlo Sampling in Trees

Computer Science – Artificial Intelligence

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

8 pages, 6 figures

Scientific paper

UCT, a state-of-the art algorithm for Monte Carlo tree sampling (MCTS), is based on UCB, a sampling policy for the Multi-armed Bandit Problem (MAB) that minimizes the accumulated regret. However, MCTS differs from MAB in that only the final choice, rather than all arm pulls, brings a reward, that is, the simple regret, as opposite to the cumulative regret, must be minimized. This ongoing work aims at applying meta-reasoning techniques to MCTS, which is non-trivial. We begin by introducing policies for multi-armed bandits with lower simple regret than UCB, and an algorithm for MCTS which combines cumulative and simple regret minimization and outperforms UCT. We also develop a sampling scheme loosely based on a myopic version of perfect value of information. Finite-time and asymptotic analysis of the policies is provided, and the algorithms are compared empirically.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Doing Better Than UCT: Rational Monte Carlo Sampling in Trees does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Doing Better Than UCT: Rational Monte Carlo Sampling in Trees, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Doing Better Than UCT: Rational Monte Carlo Sampling in Trees will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-181113

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.