Abstract
We present a simulation-based algorithm called "Simulated Annealing Multiplicative Weights" (SAMW) for solving large finite-horizon stochastic dynamic programming problems. At each iteration of the algorithm, a probability distribution over candidate policies is updated by a simple multiplicative weight rule, and with proper annealing of a control parameter, the generated sequence of distributions converges to a distribution concentrated only on the best policies. The algorithm is "asymptotically efficient," in the sense that for the goal of estimating the value of an optimal policy, a provably convergent finite-time upper bound for the sample mean is obtained.
| Original language | English |
|---|---|
| Pages (from-to) | 89-94 |
| Number of pages | 6 |
| Journal | IEEE Transactions on Automatic Control |
| Volume | 52 |
| Issue number | 1 |
| DOIs | |
| State | Published - Jan 2007 |
Keywords
- Learning algorithms
- Markov decision processes
- Simulated annealing
- Simulation
- Stochastic dynamic programming
Fingerprint
Dive into the research topics of 'An asymptotically efficient simulation-based algorithm for finite horizon stochastic dynamic programming'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver