Computer Science – Learning
Scientific paper
2011-04-29
Computer Science
Learning
A full version of an ICML 2011 paper
Scientific paper
We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopolynomial exact and approximation algorithms.
Mannor Shie
Tsitsiklis John
No associations
LandOfFree
Mean-Variance Optimization in Markov Decision Processes does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Mean-Variance Optimization in Markov Decision Processes, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Mean-Variance Optimization in Markov Decision Processes will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-17508