Mathematics – Optimization and Control
Scientific paper
2009-08-20
Mathematics
Optimization and Control
Scientific paper
The method of generalized Hamilton-Jacobi-Bellman equations (GHJB) is a powerful way of creating near-optimal controllers by learning. It is based on the fact that if we have a feedback controller, and we learn to compute the gradient grad-J of its cost-to-go function, then we can use that gradient to define a better controller. We can then use the new controller's grad-J to define a still-better controller, and so on. Here I point out that GHJB works indirectly in the sense that it doesn't learn the best approximation to grad-J but instead learns the time derivative dJ/dt, and infers grad-J from that. I show that we can get simpler and lower-cost controllers by learning grad-J directly. To do this, we need teaching signals that report grad-J(x) for a varied set of states x. I show how to obtain these signals, using the GHJB equation to calculate one component of grad-J(x) -- the one parallel with dx/dt -- and computing all the other components by backward-in-time integration, using a formula similar to the Euler-Lagrange equation. I then compare this direct algorithm with GHJB on 2 test problems.
No associations
LandOfFree
Simpler near-optimal controllers through direct supervision does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Simpler near-optimal controllers through direct supervision, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Simpler near-optimal controllers through direct supervision will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-525958