Experiments with Infinite-Horizon, Policy-Gradient Estimation

Computer Science – Artificial Intelligence

Scientific paper

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Experiments with Infinite-Horizon, Policy-Gradient Estimation Experiments with Infinite-Horizon, Policy-Gradient Estimation

: 2011-06-03
: arxiv.org/abs/1106.0666v1
: Journal Of Artificial Intelligence Research, Volume 15, pages 351-381, 2001
: Computer Science
: Artificial Intelligence

: Scientific paper
: 10.1613/jair.807
: In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter and Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs. The algorithm's chief advantages are that it uses only one free parameter beta, which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of (Baxter and Bartlett, this volume) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.

Affiliated with

Bartlett Peter L.

Computer Science – Learning

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Baxter Jonathan

Computer Science – Learning

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Weaver Lex

Computer Science – Distributed – Parallel – and Cluster Computing

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Experiments with Infinite-Horizon, Policy-Gradient Estimation does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Experiments with Infinite-Horizon, Policy-Gradient Estimation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Experiments with Infinite-Horizon, Policy-Gradient Estimation will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFWR-SCP-O-224025

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure