Infinite-Horizon Policy-Gradient Estimation

Computer Science – Artificial Intelligence

Scientific paper

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Infinite-Horizon Policy-Gradient Estimation Infinite-Horizon Policy-Gradient Estimation

: 2011-06-03
: arxiv.org/abs/1106.0665v1
: Journal Of Artificial Intelligence Research, Volume 15, pages 319-350, 2001
: Computer Science
: Artificial Intelligence

: Scientific paper
: 10.1613/jair.806
: Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes POMDPs controlled by parameterized stochastic policies. A similar algorithm was proposed by (Kimura et al. 1995). The algorithm's chief advantages are that it requires storage of only twice the number of policy parameters, uses one free beta (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter beta is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter et al., this volume) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward.

Affiliated with

Bartlett Peter L.

Computer Science – Learning

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Baxter Jonathan

Computer Science – Learning

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Infinite-Horizon Policy-Gradient Estimation does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Infinite-Horizon Policy-Gradient Estimation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Infinite-Horizon Policy-Gradient Estimation will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFWR-SCP-O-224020

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure