Kybernetika 57 no. 2, 295-311, 2021

Constrained optimality problem of Markov decision processes with Borel spaces and varying discount factors

Xiao Wu and Yanqiu TangDOI: 10.14736/kyb-2021-2-0295

Abstract:

This paper focuses on the constrained optimality of discrete-time Markov decision processes (DTMDPs) with state-dependent discount factors, Borel state and compact Borel action spaces, and possibly unbounded costs. By means of the properties of so-called occupation measures of policies and the technique of transforming the original constrained optimality problem of DTMDPs into a convex program one, we prove the existence of an optimal randomized stationary policies under reasonable conditions.

Keywords:

discrete-time Markov decision processes, Borel state and action spaces, varying discount factors, unbounded costs, constrained optimality problem

Classification:

90C40, 60J27

References:

  1. E. Altman: Denumerable constrained Markov decision processes and finite approximations. Math. Meth. Operat. Res. 19 (1994), 169-191.   DOI:10.1155/S1073792894000188
  2. E. Altman: Constrained Markov decision processes. Chapman and Hall/CRC, Boca Raton 1999.   CrossRef
  3. J. Alvarez-Mena and O. Hernández-Lerma: Convergence of the optimal values of constrained Markov control processes. Math. Meth. Oper. Res. 55 (2002), 461-484.   CrossRef
  4. V. Borkar: A convex analytic approach to Markov decision processes. Probab. Theory Relat. Fields 78 (1988), 583-602.   CrossRef
  5. J. González-Hernández and O. Hernández-Lerma: Extreme points of sets of randomized strategies in constrained optimization and control problems. SIAM. J. Optim. 15 (2005), 1085-1104.   DOI:10.1137/040605345
  6. X. P. Guo, A. Hernández-del-Valle and O. Hernández-Lerma: First passage problems for nonstationary discrete-time stochastic control systems. Europ. J. Control 18 (2012), 528-538.   DOI:10.3166/EJC.18.528-538
  7. X. P. Guo and W. Z. Zhang: Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. Europ. J, Oper. Res. 238 (2014), 486-496.   DOI:10.1016/j.ejor.2014.03.037
  8. X. P. Guo, X. Y. Song and Y. Zhang: First passage criteria for continuous-time Markov decision processes with varying discount factors and history-dependent policies. IEEE Trans. Automat. Control 59 (2014), 163-174.   DOI:10.1109/tac.2013.2281475
  9. O. Hernández-Lerma and J. González-Hernández: Constrained Markov Decision Processes in Borel spaces: the discounted case. Math. Meth. Operat. Res. 52 (2000), 271-285.   DOI:10.1155/S1073792800000167
  10. O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Processes. Springer-Verlag, New York 1996.   CrossRef
  11. O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Processes. Springer-Verlag, New York 1999.   CrossRef
  12. O. Hernández-Lerma and J. B. Lasserre: Fatou's lemma and Lebesgue's convergence theorem for measures. J. Appl. Math. Stoch. Anal. 13(2) (2000), 137-146.   DOI:10.1155/s1048953300000150
  13. Y. H. Huang and X. P. Guo: First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta. Math. Appl. Sin-E. 27(2) (2011), 177-190.   DOI:10.1007/s10255-011-0061-2
  14. Y. H. Huang, Q. D. Wei and X. P. Guo: Constrained Markov decision processes with first passage criteria. Ann. Oper. Res. 206 (2013), 197-219.   DOI:10.1007/s10479-012-1292-1
  15. X. Mao and A. Piunovskiy: Strategic measures in optimal control problems for stochastic sequences. Stoch. Anal. Appl. 18 (2000), 755-776.   DOI:10.1080/07362990008809696
  16. A. Piunovskiy: Optimal Control of Random Sequences in Problems with Constraints. Kluwer Academic, Dordrecht 1997.   CrossRef
  17. A. Piunovskiy: Controlled random sequences: the convex analytic approach and constrained problems. Russ. Math. Surv., 53 (2000), 1233-1293.   DOI:10.1070/rm1998v053n06abeh000090
  18. Y. Prokhorov: Convergence of random processes and limit theorems in probability theory. Theory Probab Appl. 1 (1956), 157-214.   DOI:10.1137/1101016
  19. Q. D. Wei and X. P. Guo: Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper. Res. Lett. 39 (2011), 369-374.   DOI:10.1016/j.orl.2011.06.014
  20. X. Wu and X. P. Guo: First passage optimality and variance minimization of Markov decision processes with varying discount factors. J. Appl. Probab. 52(2) (2015), 441-456.   DOI:10.1017/S0021900200012560
  21. Y. Zhang: Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. TOP 21 (2013), 378-408.   DOI:10.1007/s11750-011-0186-8