Abstract
Discrete-time, infinite-horizon stochastic decision processes with various reward criteria are addressed. Sufficient conditions are obtained for the value of a class of strategies to be equal to the value of the subclass of nonrandomized strategies from this class. Two different methods for proving that nonrandomized strategies are as good as arbitrary ones are considered. The first method is based on the fact that a strategic measure, i.e., the measure on the set of trajectories generated by a strategy and an initial distribution, for any strategy may be represented as a linear combination (or a linear operator) of strategic measures generated by nonrandomized strategies and the same initial distribution. This method is applicable to various criteria and classes of strategies. The second method is applicable to Markov decision processes with the expected total reward criterion. It is based on linearity properties of optimality equations, on the approximation of dynamic programming models by negative dynamic programming models, and on the replacement of the initial model by another one whose states represent information about the past in the initial model.
| Original language | English |
|---|---|
| Pages (from-to) | 2149-2154 |
| Number of pages | 6 |
| Journal | Proceedings of the IEEE Conference on Decision and Control |
| Volume | 4 |
| State | Published - 1990 |
| Event | Proceedings of the 29th IEEE Conference on Decision and Control Part 6 (of 6) - Honolulu, HI, USA Duration: Dec 5 1990 → Dec 7 1990 |
Fingerprint
Dive into the research topics of 'Optimality of pure strategies in stochastic decision processes'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver