Simpler near-optimal controllers through direct supervision

Mathematics – Optimization and Control

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

The method of generalized Hamilton-Jacobi-Bellman equations (GHJB) is a powerful way of creating near-optimal controllers by learning. It is based on the fact that if we have a feedback controller, and we learn to compute the gradient grad-J of its cost-to-go function, then we can use that gradient to define a better controller. We can then use the new controller's grad-J to define a still-better controller, and so on. Here I point out that GHJB works indirectly in the sense that it doesn't learn the best approximation to grad-J but instead learns the time derivative dJ/dt, and infers grad-J from that. I show that we can get simpler and lower-cost controllers by learning grad-J directly. To do this, we need teaching signals that report grad-J(x) for a varied set of states x. I show how to obtain these signals, using the GHJB equation to calculate one component of grad-J(x) -- the one parallel with dx/dt -- and computing all the other components by backward-in-time integration, using a formula similar to the Euler-Lagrange equation. I then compare this direct algorithm with GHJB on 2 test problems.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Simpler near-optimal controllers through direct supervision does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Simpler near-optimal controllers through direct supervision, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Simpler near-optimal controllers through direct supervision will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-525958

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.