Computer Science – Learning
Scientific paper
2001-05-17
COLT 2001: The Fourteenth Annual Conference on Computational Learning Theory
Computer Science
Learning
14 pages
Scientific paper
Reinforcement learning means finding the optimal course of action in Markovian environments without knowledge of the environment's dynamics. Stochastic optimization algorithms used in the field rely on estimates of the value of a policy. Typically, the value of a policy is estimated from results of simulating that very policy in the environment. This approach requires a large amount of simulation as different points in the policy space are considered. In this paper, we develop value estimators that utilize data gathered when using one policy to estimate the value of using another policy, resulting in much more data-efficient algorithms. We consider the question of accumulating a sufficient experience and give PAC-style bounds.
Mukherjee Sayan
Peshkin Leonid
No associations
LandOfFree
Bounds on sample size for policy evaluation in Markov environments does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Bounds on sample size for policy evaluation in Markov environments, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Bounds on sample size for policy evaluation in Markov environments will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-184622