On the Performance of Maximum Likelihood Inverse Reinforcement Learning

Computer Science – Learning

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

Inverse reinforcement learning (IRL) addresses the problem of recovering a task description given a demonstration of the optimal policy used to solve such a task. The optimal policy is usually provided by an expert or teacher, making IRL specially suitable for the problem of apprenticeship learning. The task description is encoded in the form of a reward function of a Markov decision process (MDP). Several algorithms have been proposed to find the reward function corresponding to a set of demonstrations. One of the algorithms that has provided best results in different applications is a gradient method to optimize a policy squared error criterion. On a parallel line of research, other authors have presented recently a gradient approximation of the maximum likelihood estimate of the reward signal. In general, both approaches approximate the gradient estimate and the criteria at different stages to make the algorithm tractable and efficient. In this work, we provide a detailed description of the different methods to highlight differences in terms of reward estimation, policy similarity and computational costs. We also provide experimental results to evaluate the differences in performance of the methods.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

On the Performance of Maximum Likelihood Inverse Reinforcement Learning does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with On the Performance of Maximum Likelihood Inverse Reinforcement Learning, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and On the Performance of Maximum Likelihood Inverse Reinforcement Learning will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-63700

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.