Kybernetika 60 no. 3, 357-378, 2024

Minimizing risk probability for infinite discounted piecewise deterministic Markov decision processes

Haifeng Huo, Jinhua Cui and Xian WenDOI: 10.14736/kyb-2024-3-0357

Abstract:

The purpose of this paper is to study the risk probability problem for infinite horizon piecewise deterministic Markov decision processes (PDMDPs) with varying discount factors and unbounded transition rates. Different from the usual expected total rewards, we aim to minimize the risk probability that the total rewards do not exceed a given target value. Under the condition of the controlled state process being non-explosive is slightly weaker than the corresponding ones in the previous literature, we prove the existence and uniqueness of a solution to the optimality equation, and the existence of the risk probability optimal policy by using the value iteration algorithm. Finally, we provide two examples to illustrate our results, one of which explains and verifies our conditions and the other shows the computational results of the value function and the risk probability optimal policy.

Keywords:

optimal policy, risk probability criterion, piecewise deterministic Markov decision processes, the value iteration algorithm

Classification:

90C40, 60E20

paper.pdf

References:

A. Almudevar: A dynamic programming algorithm for the optimal control of piecde-wise deterministic Markov processes. SIAM J. Control Optim. 40 (2001), 525-539. DOI:10.1137/S0363012999364474
D. Bertsekas and S. Shreve: Stochastic Optimal Control: The Discrete-Time Case. Academic Press Inc, New York 1978. CrossRef
O. L. V. Costa and F. Dufour: The vanishing discount approach for the average continuous of piecewise deterministic Markov processes. J. Appl. Probab. 46 (2009), 1157-1183. DOI:10.1017/S0021900200006203
O. L. V. Costa and F. Dufour: Continuous Average Control of Piecewise Deterministic Markov Processes. Springer-Vrelag, New York 2013. CrossRef
N. Bauerle and U. Rieder: Markov Decision Processes with Applications to Finance. Springer, Heidelberg 2011. CrossRef
D. Bertsekas and S. Shreve: Stochastic Optimal Control: The Discrete-Time Case. Academic Press Inc, New York 1978. CrossRef
K. Boda, J. A. Filar and Y. L. Lin: Stochastic target hitting time and the problem of early retirement. IEEE Trans. Automat. Control.49 (2004), 409-419. DOI:10.1109/TAC.2004.824469
M. H. A. Davis: Piecewise deterministic Markov processes: a general class of nondiffusion stochastic models. J. Roy. Statist. Soc. 46 (1984), 353-388. DOI:10.1111/j.2517-6161.1984.tb01308.x
M. H. A. Davis: Markov Models and Optimization. Chapman and Hall 1993. DOI:10.1007/978-1-4899-4483-2
F. Dufou, M. Horiguchi and A. Piunovskiy: Optimal impulsive control of piecewise deterministic Markov processes. Stochastics 88 (2016), 1073-1098. DOI:10.1080/17442508.2016.1197925
X. P. Guo and O. Hernández-Lerma: Continuous-Time Markov Decision Process: Theorey and Applications. Springer-Verlag, Berlin 2009. CrossRef
X. P. Guo and A. Piunovskiy: Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Oper. Res. 36 (2011), 105-132. DOI:10.1287/moor.1100.0477
X. P. Guo, X. Y. Song and Y. Zhang: First passage optimality for continuous time Markov decision processes with varying discount factors and history-dependent policies. IEEE Trans. Automat. Control 59 (2014), 163-174. DOI:10.1109/TAC.2013.2281475
O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Process: Basic Optimality Criteria. Springer-Verlag, New York 1996. CrossRef
J. P. Hespanha: A model for stochastic hybrid systems with applications to communication networks. Nonlinear Anal. 62 (2005), 1353-1383. DOI:10.1016/j.na.2005.01.112
Y. H. Huang and X. P. Guo: Finite-horizon piecewise deterministic Markov decision processes with unbounded transition rates. Stochastics 91 (2019), 67-95. DOI:10.1080/17442508.2018.1518450
Y. H. Huang, X. P. Guo and Z. F. Li: Minimum risk probability for finite horizon semi-Markov decision process. J. Math. Anal. Appl. 402 (2013), 378-391. DOI:10.1016/j.jmaa.2013.01.021
X. X. Huang, X. L. Zou and X. P. Guo: A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci. China Math. 58 (2015), 1923-1938. DOI:10.1007/s11425-015-5029-x
H. F. Huo and X. Wen: First passage risk probability optimality for continuous time Markov decision processes. Kybernetika 55 (2019), 114-133. DOI:10.14736/kyb-2019-1-0114
H. F. Huo, X. L. Zou and X. P. Guo: The risk probability criterion for discounted continuous-time Markov decision processes. Discrete Event Dynamic system: Theory Appl. 27 (2017), 675-699. DOI:10.1007/s10626-017-0257-6
J. Janssen and R. Manca: Semi-Markov Risk Models For Finance, Insurance, and Reliability. Springer-Verlag, New York 2006. CrossRef
Y. L. Lin, R. J. Tomkins and C. L. Wang: Optimal models for the first arrival time distribution function in continuous time with a special case. Acta. Math. Appl. Sinica 10 (1994) 194-212. DOI:10.1007/BF02006119
Y. Ohtsubo and K. Toyonaga: Optimal policy for minimizing risk models in Markov decision processes. J. Math. Anal. Appl. 271 (2002), 66-81. DOI:10.1016/s0022-247x(02)00097-5
A. Piunovskiy and Y. Zhang: Continuous-Time Markov Decision Processes: Borel Space Models and General Control Strategies. Springer, 2020. CrossRef
X. Wen, H. F. Huo and X. P. Guo: First passage risk probability minimization for piecewise deterministic Markov decision processes. Acta Math. Appl. Sinica 38 (2022), 549-567. DOI:10.1007/s10255-022-1098-0
C. B. Wu and Y. L. Lin: Minimizing risk models in Markov decision processes with policies depending on target values. J. Math. Anal. Appl. 231 (1999), 47-57. DOI:10.1006/jmaa.1998.6203
X. Wu and X. P. Guo: First passage optimality and variance minimization of Markov decision processes with varying discount factors. J. Appl. Prob. 52 (2015), 441-456. DOI:10.1017/S0021900200012560

Kybernetika

Journal