Kybernetika 50 no. 6, 950-977, 2014

Strong average optimality criterion for continuous-time Markov decision processes

Qingda Wei and Xian ChenDOI: 10.14736/kyb-2014-6-0950

Abstract:

This paper deals with continuous-time Markov decision processes with the unbounded transition rates under the strong average cost criterion. The state and action spaces are Borel spaces, and the costs are allowed to be unbounded from above and from below. Under mild conditions, we first prove that the finite-horizon optimal value function is a solution to the optimality equation for the case of uncountable state spaces and unbounded transition rates, and that there exists an optimal deterministic Markov policy. Then, using the two average optimality inequalities, we show that the set of all strong average optimal policies coincides with the set of all average optimal policies, and thus obtain the existence of strong average optimal policies. Furthermore, employing the technique of the skeleton chains of controlled continuous-time Markov chains and Chapman-Kolmogorov equation, we give a new set of sufficient conditions imposed on the primitive data of the model for the verification of the uniform exponential ergodicity of continuous-time Markov chains governed by stationary policies. Finally, we illustrate our main results with an example.

Keywords:

optimal policy, continuous-time Markov decision processes, strong average optimality criterion, finite-horizon expected total cost criterion, unbounded transition rates, optimal value function

Classification:

93E20, 90C40

References:

  1. N. Bäuerle and U. Rieder: Markov Decision Processes with Applications to Finance. Springer, Berlin 2011.   CrossRef
  2. D. P. Bertsekas and S. E. Shreve: Stochastic Optimal Control: The Discrete-time Case. Academic Press, New York 1978.   CrossRef
  3. R. Cavazos-Cadena and E. Fernández-Gaucherand: Denumerable controlled Markov chains with strong average optimality criterion: bounded and unbounded costs. Math. Methods Oper. Res. 43 (1996), 281-300.   CrossRef
  4. N. M. van Dijk: On the finite horizon Bellman equation for controlled Markov jump models with unbounded characteristics: existence and approximation. Stochastic Process. Appl. 28 (1988), 141-157.   CrossRef
  5. E. B. Dynkin and A. A. Yushkevich: Controlled Markov Processes. Springer, New York 1979.   CrossRef
  6. W. Feller: On the integro-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc. 48 (1940), 488-515.   CrossRef
  7. J. Flynn: On optimality criteria for dynamic programs with long finite horizons. J. Math. Anal. Appl. 76 (1980), 202-208.   CrossRef
  8. M. K. Ghosh and S. I. Marcus: On strong average optimality of Markov decision processes with unbounded costs. Oper. Res. Lett. 11 (1992), 99-104.   CrossRef
  9. M. K. Ghosh and S. Saha: Continuous-time controlled jump Markov processes on the finite horizon. In: Optimization, Control, and Applications of Stochastic Systems (D. Hernández-Hernández and J. A. Minjárez-Sosa, eds.), Springer, New York 2012, pp. 99-109.   CrossRef
  10. I. I. Gihman and A. V. Skohorod: Controlled Stochastic Processes. Springer, Berlin 1979.   CrossRef
  11. X. P. Guo and U. Rieder: Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Probab. 16 (2006), 730-756.   CrossRef
  12. X. P. Guo: Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math. Oper. Res. 32 (2007), 73-87.   CrossRef
  13. X. P. Guo and O. Hernández-Lerma: Continuous-Time Markov Decision Processes: Theory and Applications. Springer, Berlin 2009.   CrossRef
  14. X.P. Guo and L. E. Ye: New discount and average opti mality conditions for continuous-time Markov decision processes. Adv. in Appl. Probab. 42 (2010), 953-985.   CrossRef
  15. O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York 1996.   CrossRef
  16. O. Hernández-Lerma and J. B. Lasserre: Further Topics on Discrete-Time Markov Control Processes. Springer, New York 1999.   CrossRef
  17. S. P. Meyn and R. L. Tweedie: Computable bounds for geometric convergence rates of Markov chains. Ann. Appl. Probab. 4 (1994), 981-1011.   CrossRef
  18. B. L. Miller: Finite state continuous time Markov decision processes with finite planning horizon. SIAM J. Control 6 (1968), 266-280.   CrossRef
  19. S. R. Pliska: Controlled jump processes. Stochastic Process. Appl. 3 (1975), 259-282.   CrossRef
  20. M. L. Puterman: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York 1994.   CrossRef
  21. L. E. Ye and X. P. Guo: New sufficient conditions for average optimality in continuous-time Markov decision processes. Math. Methods Oper. Res. 72 (2010), 75-94.   CrossRef
  22. A. A. Yushkevich: Controlled jump Markov models. Theory Probab. Appl. 25 (1980), 244-266.   CrossRef
  23. Q. X. Zhu: Average optimality inequality for continuous-time Markov decision processes in Polish spaces. Math. Methods Oper. Res. 66 (2007), 299-313.   CrossRef
  24. Q.X. Zhu: Average optimality for continuous-time Markov decision processes with a policy iteration approach. J. Math. Anal. Appl. 339 (2008), 691-704.   CrossRef