Kybernetika 55 no. 1, 166-182, 2019

Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs

Beatris A. Escobedo-Trujillo and Carmen G. Higuera-ChanDOI: 10.14736/kyb-2019-1-0166

Abstract:

In this paper we are concerned with a class of time-varying discounted Markov decision models $\mathcal{M}_n$ with unbounded costs $c_n$ and state-action dependent discount factors. Specifically we study controlled systems whose state process evolves according to the equation $x_{n+1}=G_n(x_n,a_n,\xi_n), n=0,1,\ldots$, with state-action dependent discount factors of the form $\alpha_n(x_n,a_n)$, where $a_n$ and $\xi_n$ are the control and the random disturbance at time $n$, respectively. Assuming that the sequences of functions $\lbrace\alpha_n\rbrace$,$\lbrace c_n\rbrace$ and $\lbrace G_n\rbrace$ converge, in certain sense, to $\alpha_\infty$, $c_\infty$ and $G_\infty$, our objective is to introduce a suitable control model for this class of systems and then, to show the existence of optimal policies for the limit system $\mathcal{M}_\infty$ corresponding to $\alpha_\infty$, $c_\infty$ and $G_\infty$. Finally, we illustrate our results and their applicability in a class of semi-Markov control models.

Keywords:

discounted optimality, non-constant discount factor, time-varying Markov decision processes

Classification:

93E20, 90C40

References:

  1. G. Bastin and D. Dochain: On-line Estimation and Adaptive Control of Bioreactors. Elsevier, Amsterdam 2014.   CrossRef
  2. D. P. Bertsekas: Approximate policy iteration: a survey and some new methods. J. Control Theory Appl. 9 (2011), 310-335.   DOI:10.1007/s11768-011-1005-3
  3. E. B. Dynkin and A. A. Yushkevich: Controlled Markov Processes. Springer-Verlag, New York 1979.   DOI:10.1007/978-1-4615-6746-2
  4. J. González-Hernández, R. R. López-Martínez and J. A. Minjárez-Sosa: Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45 (2009), 737-754.   CrossRef
  5. E. I. Gordienko and J. A. Minjárez-Sosa: Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion. Kybernetika 34 (1998), 217-234.   CrossRef
  6. O. Hernández-Lerma and J. B. Lasseerre: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York 1996.   DOI:10.1007/978-1-4612-0729-0
  7. \noindent O. Hernández-Lerma and J. B. Lasserre: Further Topics on Discrete-time Markov Control Processes. Springer-Verlag, New York 1999.   DOI:10.1007/978-1-4612-0561-6
  8. O. Hernández-Lerma and N. Hilgert: Limiting optimal discounted-cost control of a class of time-varying stochastic systems. Syst. Control Lett. 40 (2000), 1, 37-42.   DOI:10.1016/s0167-6911(99)00121-8
  9. N. Hilgert and J. A. Minjárez-Sosa: Adaptive policies for time-varying stochastic systems under discounted criterion. Math. Meth. Oper. Res. 54 (2001), 3, 491-505.   DOI:10.1007/s001860100170
  10. N. Hilgert and J. A. Minjárez-Sosa: Adaptive control of stochastic systems with unknown disturbance distribution: discounted criteria. Math. Meth. Oper. Res. 63 (2006), 443-460.   DOI:10.1007/s00186-005-0024-6
  11. N. Hilgert, R. Senoussi and J. P. Vila: Nonparametric estimation of time-varying autoregressive nonlinear processes. C. R. Acad. Sci. Paris Série 1 1996), 232, 1085-1090.   DOI:10.1109/.2001.980647
  12. M. E. Lewis and A. Paul: Uniform turnpike theorems for finite Markov decision processes. Math. Oper. Res.   CrossRef
  13. F. Luque-Vásquez and J. A. Minjárez-Sosa: Semi-Markov control processes with unknown holding times distribution under a discounted criterion. Math. Meth. Oper. Res. 61 (2005), 455-468.   DOI:10.1007/s001860400406
  14. F. Luque-Vásquez, J. A. Minjárez-Sosa and L. C. Rosas-Rosas: Semi-Markov control processes with partially known holding times distribution: Discounted and average criteria. Acta Appl. Math. 114 (2011), 3, 135-156.   DOI:10.1007/s10440-011-9605-y
  15. F. Luque-Vásquez, J. A. Minjárez-Sosa and L. C. Rosas-Rosas: Semi-Markov control processes with unknown holding times distribution under an average criterion cost. Appl. Math. Optim. Theory Appl. 61 (2010), 3, 317-336.   DOI:10.1007/s00245-009-9086-9
  16. J. A. Minjárez-Sosa: Markov control models with unknown random state-action-dependent discount factors. TOP 23 (2015), 743-772.   DOI:10.1007/s11750-015-0360-5
  17. J. A. Minjárez-Sosa: Approximation and estimation in Markov control processes under discounted criterion. Kybernetika 40 (2004), 6, 681-690.   CrossRef
  18. W. B. Powell: Approximate Dynamic Programming. Solving the Curse of Dimensionality John Wiley and Sons Inc, 2007.   DOI:10.1002/9780470182963
  19. M. L. Puterman: Markov Decision Processes. Discrete Stochastic Dynamic Programming. John Wiley and Sons 1994.   DOI:10.1002/9780470316887
  20. U. Rieder: Measurable selection theorems for optimization problems. Manuscripta Math. 24 (1978), 115-131.   DOI:10.1007/bf01168566
  21. M. T. Robles-Alcaráz, O. Vega-Amaya and J. A. Minjárez-Sosa: Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces. Risk Decision Analysis 6 (2017), 2, 79-95.   DOI:10.3233/rda-160116
  22. H. L. Royden: Real Analysis. Prentice Hall 1968.   CrossRef
  23. M. Schäl: Conditions for optimality and for the limit on n-stage optimal policies to be optimal. Z. Wahrs. Verw. Gerb. 32 (1975), 179-196.   DOI:10.1007/bf00532612
  24. J. F. Shapiro: Turnpike planning horizon for a markovian decision model. Magnament Sci. 14 (1968), 292-300.   DOI:10.1287/mnsc.14.5.292