Kybernetika 58 no. 6, 960-983, 2022

Partially observable Markov decision processes with partially observable random discount factors

E. Everardo Martinez-Garcia, J. Adolfo Minjárez-Sosa and Oscar Vega-AmayaDOI: 10.14736/kyb-2022-6-0960

Abstract:

This paper deals with a class of partially observable discounted Markov decision processes defined on Borel state and action spaces, under unbounded one-stage cost. The discount rate is a stochastic process evolving according to a difference equation, which is also assumed to be partially observable. Introducing a suitable control model and filtering processes, we prove the existence of optimal control policies. In addition, we illustrate our results in a class of GI/GI/1 queueing systems where we obtain explicitly the corresponding optimality equation and the filtering process.

Keywords:

queueing models, partially observable systems, discounted criterion, optimal policies, random discount factors

Classification:

90C39, 90B22

paper.pdf

References:

A. Bensoussan, M. Cakanyildirim and S. P. Sethi: Partially observed inventory systems: the case of zero-balance walk. SIAM J. Control Optim. 46 (2007), 176-209. DOI:10.1137/040620321
D. P. Bertsekas and S. E. Shreve: Stochastic Optimal Control: The Discrete Time Case. Academic Press, New York 1978. DOI:10.1137/1022042
Y. Carmon and A. Shwartz: Markov decision processes with exponentially representable discounting. Oper. Res. Lett. 37 (2009), 51-55. DOI:10.1016/j.orl.2008.10.005
H. Cruz-Suárez and R. Montes-de-Oca: Discounted Markov control processes induced by deterministic systems. Kybernetika 42 (2006), 647-664. CrossRef
E. B. Dynkin and A. A. Yushkevich: Controlled Markov Processes. Springer-Verlag, New York 1979. DOI:10.1137/1023056
R. J. Elliott, L. Aggoun and J. B. Moore: Hidden Markov Models: Estimation and Control. Springer-Verlag, New York 1994. DOI10.1007/978-0-387-84854-9:
E. A. Feinberg and A. Shwartz: Constrained dynamic programming with two discount factors: applications and an algorithm. IEEE Trans. Automat. Control 44 (1999), 628-631. DOI:10.1109/9.751365
J. González-Hernández, R R. López-Martínez and J. A. Minjárez-Sosa: Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45 (2009), 737-754. DOI:10.1017/S0022226709990132
J. González-Hernández, R. R. López-Martínez, J. A. Minjárez-Sosa and J. R.Gabriel-Arguelles: Constrained Markov control processes with randomized discounted rate: infinite linear programming approach. Optim. Control Appl. Meth. 35 (2014), 575-591. DOI:10.1002/oca.2089
Y. H. García, S. Diaz-Infante and J. A. Minjarez-Sosa: Partially observable queueing systems with controlled service rates under a discounted optimality criterion. Kybernetika 57 (2021), 493-512. DOI:10.14736/kyb-2021-3-0493
E- I- Gordienko and F. S. Salem: Robustness inequality for Markov control processes with unbounded costs. Syst. Control Lett. 33 (1998), 125-130. DOI:10.1016/S0167-6911(97)00077-7
E. Gordienko, E. Lemus-Rodríguez and R. Montes-de-Oca: Discounted cost optimality problem: stability with respect to weak metrics. Math. Methods Oper. Res. 68 (2008), 77-96. DOI:10.1007/s00186-007-0171-z
E. Gordienko and J. A. Minjarez-Sosa: Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion. Kybernetika 34 (1998), 217-234. CrossRef
O. Hernandez-Lerma: Adaptive Markov Control Processes. Springer-Verlag, New York 1989. DOI:10.1137/1033169
O. Hernandez-Lerma and W. Runggaldier: Monotone approximations for convex stochastic control problems. J. Math. Syst. Estim. Control 4 (1994), 99-140. CrossRef
O. Hernandez-Lerma and M. Munoz-de-Ozak: Discrete-time Markov control processes with discounted unbounded costs: optimality criteria. Kybernetika 28 (1992), 191-221. DOI:10.1016/S0010-9452(13)80050-0
O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York 1996. CrossRef
N. Hilgert and J. A. Minjarez-Sosa: Adaptive policies for time-varying stochastic systems under discounted criterion. Math. Methods Oper. Res. 54 (2001), 491-505. DOI:10.1007/s001860100170
K. Hinderer: Foundations of Non-stationary Dynamic Programming with Discrete Time parameter. In: Lecture Notes Oper. Res. 33, Springer, New York 1979. CrossRef
H. Jasso-Fuentes, J. L. Menaldi and T. Prieto-Rumeau: Discrete-time control with non-constant discount factor. Math. Methods Oper. Res. 92 (2020), 377-399. DOI:10.1007/s00186-020-00716-8
J. A. Minjarez-Sosa: Approximation and estimation in Markov control processes under discounted criterion. Kybernetika 40 (2004), 681-690. DOI:10.1016/j.jvs.2004.07.005
J. A. Minjarez-Sosa: Markov control models with unknown random state-action-dependent discount factors. TOP 23 (2015), 743-772. DOI:10.1007/s11750-015-0360-5
U. Rieder: Measurable selection theorems for optimization problems. Manuscripta Math. 24 (1978), 115-131. DOI:10.1007/BF01168566
W. J. Runggaldier and L. Stettner: Approximations of Discrete Time Partially Observed Control Problems. Applied Mathematics Monographs CNR 6, Giardini, Pisa 1994. DOI:10.1007/BFb0006563
C. Striebel: Optimal Control of Discrete Time Stochastic Systems. Lecture Notes Econ. Math. Syst. 110, Springer-Verlag, Berlin 1975. CrossRef
Q. Wei and X. Guo: Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper. Res. Lett. 39 (2011), 368-274. DOI:10.1016/j.orl.2011.06.014

Kybernetika

Journal