Gaussian Process Bandits for Tree Search: Theory and Application to Planning in Discounted MDPs

Computer Science – Learning

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Second draft. Tried to follow the JMLR formatting guidelines. Made corrections to the section on planning in MDPs

Scientific paper

We motivate and analyse a new Tree Search algorithm, GPTS, based on recent theoretical advances in the use of Gaussian Processes for Bandit problems. We consider tree paths as arms and we assume the target/reward function is drawn from a GP distribution. The posterior mean and variance, after observing data, are used to define confidence intervals for the function values, and we sequentially play arms with highest upper confidence bounds. We give an efficient implementation of GPTS and we adapt previous regret bounds by determining the decay rate of the eigenvalues of the kernel matrix on the whole set of tree paths. We consider two kernels in the feature space of binary vectors indexed by the nodes of the tree: linear and Gaussian. The regret grows in square root of the number of iterations T, up to a logarithmic factor, with a constant that improves with bigger Gaussian kernel widths. We focus on practical values of T, smaller than the number of arms. Finally, we apply GPTS to Open Loop Planning in discounted Markov Decision Processes by modelling the reward as a discounted sum of independent Gaussian Processes. We report similar regret bounds to those of the OLOP algorithm.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Gaussian Process Bandits for Tree Search: Theory and Application to Planning in Discounted MDPs does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Gaussian Process Bandits for Tree Search: Theory and Application to Planning in Discounted MDPs, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Gaussian Process Bandits for Tree Search: Theory and Application to Planning in Discounted MDPs will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-684499

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.