Kybernetika 61 no. 4, 447-466, 2025

The risk probability optimal problem for infinite discounted semi-Markov decision processes

Xian Wen, Jinhua Cui and Haifeng HuoDOI: 10.14736/kyb-2025-4-0447

Abstract:

This paper investigates the risk probability minimization problem for infinite horizon semi-Markov decision processes (SMDPs) with varying discount factors. First, we establish the standard regularity condition to guarantee the state process is non-explosive. Furthermore, based only on the non-explosion of the state process, we use value iteration technique to establish the optimality equation satisfied by the value function, and prove the uniqueness of the solution and the existence of the risk probability optimal policy. Our condition is weaker than the first arrival condition commonly used in existing literature. Finally, we develop a value iteration algorithm to compute the value function and optimal policy, and illustrate the feasibility and effectiveness of the algorithm through a numerical example.

Keywords:

optimal policy, semi-Markov decision processes, risk probability criterion, value function, value iteration algorithm

Classification:

90C40, 60E20

References:

  1. D. Bertsekas and S. E. Shreve: Stochastic Optimal Control: The Discrete-Time Case. Academic Press Inc, New York 1996.   CrossRef
  2. E. A. Feinberg: Continuous time discounted jump Markov decision processes: a discrete-event approach. Math. Oper. Res. 29 (2004), 492-524.   DOI:10.1287/moor.1040.0089
  3. N. Bäuerle and U. Rieder: Markov Decision Processes with Applications to Finance. Springer, Heidelberg 2011.   CrossRef
  4. X. P. Guo and A. Piunovskiy: Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Oper. Res. 36 (2011), 105-132.   DOI:10.1287/moor.1100.04779
  5. X. P. Guo and O. Hernández-Lerma: Continuous-time markov decision processes: theory and applications. Springer-Verlag, Berlin 2009.   CrossRef
  6. X. P. Guo, X. Y. Song and Y. Zhang: First passage optimality for continuous-time Markov decision processes with varying discount factors and history-dependent policies. IEEE Trans. Automat. Control 59 (2013), 163-174.   DOI:10.1109/TAC.2013.2281475
  7. O. Hernández-Lerma and J. B. Lasserre: Discrete-time Markov control processes: basic optimality criteria. Springer-Verlag, New York 1996.   CrossRef
  8. Y. H. Huang and X. P. Guo: Optimal risk probability for first passage models in semi-Markov decision processes. J. Math. Anal. Appl. 359 (2009), 404-420.   DOI:10.1016/j.jmaa.2009.05.058
  9. Y. H. Huang and X. P. Guo: Finite horizon semi-Markov decision processes with application to maintenance systems. European J. Oper. Res. 212 (2011), 131-140.   DOI:10.1016/j.ejor.2011.01.027
  10. Y. H. Huang and X. P. Guo: First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta. Math. Appl. Sinica 27 (2011), 177-190.   DOI:10.1007/s10255-011-0061-2
  11. Y. H. Huang, X. P. Guo and Z. F. Li: Minimum risk probability for finite horizon semi-Markov decision processes. J. Math. Anal. Appl. 402 (2013), 378-391.   DOI:10.1016/j.jmaa.2013.01.021
  12. X. X. Huang, X. L. Zuo and X. P. Guo: A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci. China Math. 58 (2015), 1923-1938.   DOI:10.1007/s11425-015-5029-x
  13. H. F. Huo, X. L. Zuo and X. P. Guo: The risk probability criterion for discounted continuous-time Markov decision processes. Discrete Event Dyn. S. 27 (2017), 675-699.   DOI:10.1007/s10626-017-0257-6
  14. H. F. Huo and X. P. Guo: Risk probability minimization problems for continuous-time Markov decision processes on finite horizon. IEEE Trans. Autom. Control 65 (2019), 3199-3206.   DOI:10.1109/TAC.2019.2947654
  15. J. Janssen and R. Manca: Semi-Markov Risk Models For Finance, Insurance, and Reliability. Springer, New York 2006.   CrossRef
  16. Y. L. Lin, R. J. Tomkins and C. L. Wang: Optimal models for the first arrival time distribution function in continuous time With a special case. Acta. Math. Appl. Sinica 10 (1994), 194-212.   CrossRef
  17. J. W. Mamer: Successive approximations for finite horizon, semi-Markov decision processes with application to asset liquidation. Oper. Res. 34 (1986), 638-644.   DOI:10.1287/opre.34.4.638
  18. M. Sakaguchi and Y. Ohtsubo: Optimal threshold probability and expectation in semi-Markov decision processes. Appl. Math. Comput. 216 (2010), 2947-2958.   DOI:10.1016/j.amc.2010.04.007
  19. M. J. Sobel: The variance of discounted Markov decision processes. J. Appl. Probab. 19 (1982), 794-802.   DOI:10.1017/s0021900200023123
  20. V. Nollau: Solution of a discounted semi-markovian descision problem by successive oevarrelaxation. Optimization 39 (1997), 85-97.   DOI:10.1080/02331939708844273
  21. Y. Ohtsubo: Optimal threshold probability in undiscounted Markov decision processes with a target set. Appl. Math. Anal. Comp. 149 (2004), 519-532.   DOI:10.1016/S0096-3003(03)00158-9
  22. A. Piunovskiy, Y. Zhang and A. N. Shiryaev: Continuous-Time Markov Decision Processes: Borel Space Models and General Control Strategies. Springer, Berlin 2020.   DOI:10.1007/978-3-030-54987-9
  23. D. J. White: Minimizing a threshold probability in discounted Markov decision processes. J. Math. Anal. Appl. 173 (1993), 634-646.   DOI:10.1006/jmaa.1993.1093
  24. X. Wen, H. F. Huo and X.P. Guo: First passage risk probability minimization for piecewise deterministic Markov decision processes. Acta Math.Appl.Sin.Engl.Ser. 38 (2022), 549-567.   DOI:10.1007/s10255-022-1098-0
  25. C. Wu and Y. Lin: Minimizing risk models in Markov decision processes with policies depending on target values. J. Math. Anal. Appl. 231 (1999), 47-67.   DOI:10.1006/jmaa.1998.6203
  26. X. Wu and X. P. Guo: First passage optimality and variance minimisation of Markov decision processes with varying discount factors. J. Appl. Prob. 52 (2015), 441-456.   DOI:10.1239/jap/1437658608