Abstract
This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and λ-policy iteration algorithms are not strongly polynomial.
| Original language | English |
|---|---|
| Pages (from-to) | 429-431 |
| Number of pages | 3 |
| Journal | Operations Research Letters |
| Volume | 42 |
| Issue number | 6-7 |
| DOIs | |
| State | Published - Sep 2014 |
Keywords
- Algorithm
- Markov decision process
- Modified policy iteration
- Policy
- Strongly polynomial
Fingerprint
Dive into the research topics of 'Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver