Kybernetika 51 no. 2, 276-292, 2015

Another set of verifiable conditions for average Markov decision processes with Borel spaces

Xiaolong Zou and Xianping GuoDOI: 10.14736/kyb-2015-2-0276

Abstract:

In this paper we give a new set of verifiable conditions for the existence of average optimal stationary policies in discrete-time Markov decision processes with Borel spaces and unbounded reward/cost functions. More precisely, we provide another set of conditions, which only consists of a Lyapunov-type condition and the common continuity-compactness conditions. These conditions are imposed on the primitive data of the model of Markov decision processes and thus easy to verify. We also give two examples for which all our conditions are satisfied, but some of conditions in the related literature fail to hold.

Keywords:

discrete-time Markov decision processes, average reward criterion, optimal stationary policy, Lyapunov-type condition, unbounded reward/cost function

Classification:

90C40, 93E20

References:

  1. A. Arapostathis and et al: Discrete time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31 (1993), 282-344.   DOI:10.1137/0331018
  2. G. Casella and R. L. Berger: Statistical Inference. Second edition. Duxbury Thomson Learning 2002.   CrossRef
  3. E. B. Dynkin and A. A. Yushkevich: Controlled Markov Processes. Springer, New York 1979.   CrossRef
  4. E. Gordienko and O. Hernández-Lerma: Average cost Markov control processes with weighted norms: existence of canonical policies. Appl. Math. (Warsaw) 23 (1995), 2, 199-218.   CrossRef
  5. X. P. Guo and P. Shi: Limiting average criteria for nonstationary Markov decision processes. SIAM J. Optim. 11 (2001), 4, 1037-1053.   DOI:10.1137/s1052623499355235
  6. X. P. Guo and Q. X. Zhu: Average optimality for Markov decision processes in Borel spaces: A new condition and approach. J. Appl. Probab. 43 (2006), 318-334.   DOI:10.1239/jap/1152413725
  7. O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Processes. Springer, New York 1996.   DOI:10.1007/978-1-4612-0729-0
  8. O. Hernández-Lerma and J. B. Lasserre: Further Topics on Discrete-Time Markov Control Processes. Springer, New York 1999.   DOI:10.1007/978-1-4612-0561-6
  9. M. Kakumanu: Nondiscounted continuous time Markov decision process with countable state space. SIAM J. Control Optim. 10 (1972), 1, 210-220.   DOI:10.1137/0310016
  10. R. B. Lund and R. L. Tweedie: Geometric convergence rates for stochastically ordered Markov chains. Math. Oper. Res. 21 (1996), 1, 182-194.   DOI:10.1287/moor.21.1.182
  11. S. P. Meyn and R. L. Tweedie: Markov Chains and Stochastic Stability. Cambridge Univ. Press, New York 2009.   DOI:10.1017/cbo9780511626630
  12. M. L. Puterman: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York 1994.   DOI:10.1002/9780470316887
  13. L. I. Sennott: Average reward optimization theory for denumerable state spaces. In: Handbook of Markov Decision Processes (Int. Ser. Operat. Res. Manag. Sci. 40) (E. A. Feinberg and A. Shwartz Kluwer, eds.), Boston, pp. 153-172.   DOI:10.1007/978-1-4615-0805-2_5
  14. L. I. Sennott: Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, New York 1999.   DOI:10.1002/9780470317037
  15. Q. X. Zhu: Average optimality for continuous-time jump Markov decision processes with a policy iteration approach. J. Math. Anal. Appl. 339 (2008), 1, 691-704.   DOI:10.1016/j.jmaa.2007.06.071