Computer Science – Artificial Intelligence
Scientific paper
2002-04-17
Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT-2002) 364-379
Computer Science
Artificial Intelligence
15 pages
Scientific paper
The problem of making sequential decisions in unknown probabilistic environments is studied. In cycle $t$ action $y_t$ results in perception $x_t$ and reward $r_t$, where all quantities in general may depend on the complete history. The perception $x_t$ and reward $r_t$ are sampled from the (reactive) environmental probability distribution $\mu$. This very general setting includes, but is not limited to, (partial observable, k-th order) Markov decision processes. Sequential decision theory tells us how to act in order to maximize the total expected reward, called value, if $\mu$ is known. Reinforcement learning is usually used if $\mu$ is unknown. In the Bayesian approach one defines a mixture distribution $\xi$ as a weighted sum of distributions $\nu\in\M$, where $\M$ is any class of distributions including the true environment $\mu$. We show that the Bayes-optimal policy $p^\xi$ based on the mixture $\xi$ is self-optimizing in the sense that the average value converges asymptotically for all $\mu\in\M$ to the optimal value achieved by the (infeasible) Bayes-optimal policy $p^\mu$ which knows $\mu$ in advance. We show that the necessary condition that $\M$ admits self-optimizing policies at all, is also sufficient. No other structural assumptions are made on $\M$. As an example application, we discuss ergodic Markov decision processes, which allow for self-optimizing policies. Furthermore, we show that $p^\xi$ is Pareto-optimal in the sense that there is no other policy yielding higher or equal value in {\em all} environments $\nu\in\M$ and a strictly higher value in at least one.
No associations
LandOfFree
Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-304439