Kybernetika 61 no. 4, 447-466, 2025

The risk probability optimal problem for infinite discounted semi-Markov decision processes

Xian Wen, Jinhua Cui and Haifeng HuoDOI: 10.14736/kyb-2025-4-0447

Abstract:

This paper investigates the risk probability minimization problem for infinite horizon semi-Markov decision processes (SMDPs) with varying discount factors. First, we establish the standard regularity condition to guarantee the state process is non-explosive. Furthermore, based only on the non-explosion of the state process, we use value iteration technique to establish the optimality equation satisfied by the value function, and prove the uniqueness of the solution and the existence of the risk probability optimal policy. Our condition is weaker than the first arrival condition commonly used in existing literature. Finally, we develop a value iteration algorithm to compute the value function and optimal policy, and illustrate the feasibility and effectiveness of the algorithm through a numerical example.

Keywords:

optimal policy, semi-Markov decision processes, risk probability criterion, value function, value iteration algorithm

Classification:

90C40, 60E20

paper.pdf

References:

D. Bertsekas and S. E. Shreve: Stochastic Optimal Control: The Discrete-Time Case. Academic Press Inc, New York 1996. CrossRef
E. A. Feinberg: Continuous time discounted jump Markov decision processes: a discrete-event approach. Math. Oper. Res. 29 (2004), 492-524. DOI:10.1287/moor.1040.0089
N. Bäuerle and U. Rieder: Markov Decision Processes with Applications to Finance. Springer, Heidelberg 2011. CrossRef
X. P. Guo and A. Piunovskiy: Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Oper. Res. 36 (2011), 105-132. DOI:10.1287/moor.1100.04779
X. P. Guo and O. Hernández-Lerma: Continuous-time markov decision processes: theory and applications. Springer-Verlag, Berlin 2009. CrossRef
X. P. Guo, X. Y. Song and Y. Zhang: First passage optimality for continuous-time Markov decision processes with varying discount factors and history-dependent policies. IEEE Trans. Automat. Control 59 (2013), 163-174. DOI:10.1109/TAC.2013.2281475
O. Hernández-Lerma and J. B. Lasserre: Discrete-time Markov control processes: basic optimality criteria. Springer-Verlag, New York 1996. CrossRef
Y. H. Huang and X. P. Guo: Optimal risk probability for first passage models in semi-Markov decision processes. J. Math. Anal. Appl. 359 (2009), 404-420. DOI:10.1016/j.jmaa.2009.05.058
Y. H. Huang and X. P. Guo: Finite horizon semi-Markov decision processes with application to maintenance systems. European J. Oper. Res. 212 (2011), 131-140. DOI:10.1016/j.ejor.2011.01.027
Y. H. Huang and X. P. Guo: First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta. Math. Appl. Sinica 27 (2011), 177-190. DOI:10.1007/s10255-011-0061-2
Y. H. Huang, X. P. Guo and Z. F. Li: Minimum risk probability for finite horizon semi-Markov decision processes. J. Math. Anal. Appl. 402 (2013), 378-391. DOI:10.1016/j.jmaa.2013.01.021
X. X. Huang, X. L. Zuo and X. P. Guo: A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci. China Math. 58 (2015), 1923-1938. DOI:10.1007/s11425-015-5029-x
H. F. Huo, X. L. Zuo and X. P. Guo: The risk probability criterion for discounted continuous-time Markov decision processes. Discrete Event Dyn. S. 27 (2017), 675-699. DOI:10.1007/s10626-017-0257-6
H. F. Huo and X. P. Guo: Risk probability minimization problems for continuous-time Markov decision processes on finite horizon. IEEE Trans. Autom. Control 65 (2019), 3199-3206. DOI:10.1109/TAC.2019.2947654
J. Janssen and R. Manca: Semi-Markov Risk Models For Finance, Insurance, and Reliability. Springer, New York 2006. CrossRef
Y. L. Lin, R. J. Tomkins and C. L. Wang: Optimal models for the first arrival time distribution function in continuous time With a special case. Acta. Math. Appl. Sinica 10 (1994), 194-212. CrossRef
J. W. Mamer: Successive approximations for finite horizon, semi-Markov decision processes with application to asset liquidation. Oper. Res. 34 (1986), 638-644. DOI:10.1287/opre.34.4.638
M. Sakaguchi and Y. Ohtsubo: Optimal threshold probability and expectation in semi-Markov decision processes. Appl. Math. Comput. 216 (2010), 2947-2958. DOI:10.1016/j.amc.2010.04.007
M. J. Sobel: The variance of discounted Markov decision processes. J. Appl. Probab. 19 (1982), 794-802. DOI:10.1017/s0021900200023123
V. Nollau: Solution of a discounted semi-markovian descision problem by successive oevarrelaxation. Optimization 39 (1997), 85-97. DOI:10.1080/02331939708844273
Y. Ohtsubo: Optimal threshold probability in undiscounted Markov decision processes with a target set. Appl. Math. Anal. Comp. 149 (2004), 519-532. DOI:10.1016/S0096-3003(03)00158-9
A. Piunovskiy, Y. Zhang and A. N. Shiryaev: Continuous-Time Markov Decision Processes: Borel Space Models and General Control Strategies. Springer, Berlin 2020. DOI:10.1007/978-3-030-54987-9
D. J. White: Minimizing a threshold probability in discounted Markov decision processes. J. Math. Anal. Appl. 173 (1993), 634-646. DOI:10.1006/jmaa.1993.1093
X. Wen, H. F. Huo and X.P. Guo: First passage risk probability minimization for piecewise deterministic Markov decision processes. Acta Math.Appl.Sin.Engl.Ser. 38 (2022), 549-567. DOI:10.1007/s10255-022-1098-0
C. Wu and Y. Lin: Minimizing risk models in Markov decision processes with policies depending on target values. J. Math. Anal. Appl. 231 (1999), 47-67. DOI:10.1006/jmaa.1998.6203
X. Wu and X. P. Guo: First passage optimality and variance minimisation of Markov decision processes with varying discount factors. J. Appl. Prob. 52 (2015), 441-456. DOI:10.1239/jap/1437658608

Kybernetika

Journal