Skip to main navigation Skip to search Skip to main content

Recursive learning automata approach to Markov decision processes

  • Hyeong Soo Chang
  • , Michael C. Fu
  • , Jiaqiao Hu
  • , Steven I. Marcus
  • Sogang University
  • University of Maryland, College Park

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

In this note, we present a sampling algorithm, called recursive automata sampling algorithm (RASA), for control of finite-horizon Markov decision processes (MDPs). By extending in a recursive manner Sastry's learning automata pursuit algorithm designed for solving nonsequential stochastic optimization problems, RASA returns an estimate of both the optimal action from a given state and the corresponding optimal value. Based on the finite-time analysis of the pursuit algorithm by Rajaraman and Sastry, we provide an analysis for the finite-time behavior of RASA. Specifically, for a given initial state, we derive the following probability bounds as a function of the number of samples: 1) a lower bound on the probability that RASA will sample the optimal action and 2) an upper bound on the probability that the deviation between the true optimal value and the RASA estimate exceeds a given error.

Original languageEnglish
Pages (from-to)1349-1355
Number of pages7
JournalIEEE Transactions on Automatic Control
Volume52
Issue number7
DOIs
StatePublished - Jul 2007

Keywords

  • Learning automata
  • Markov decision process (MDP)
  • Sampling

Fingerprint

Dive into the research topics of 'Recursive learning automata approach to Markov decision processes'. Together they form a unique fingerprint.

Cite this