Kybernetika 60 no. 3, 357-378, 2024

Minimizing risk probability for infinite discounted piecewise deterministic Markov decision processes

Haifeng Huo, Jinhua Cui and Xian WenDOI: 10.14736/kyb-2024-3-0357

Abstract:

The purpose of this paper is to study the risk probability problem for infinite horizon piecewise deterministic Markov decision processes (PDMDPs) with varying discount factors and unbounded transition rates. Different from the usual expected total rewards, we aim to minimize the risk probability that the total rewards do not exceed a given target value. Under the condition of the controlled state process being non-explosive is slightly weaker than the corresponding ones in the previous literature, we prove the existence and uniqueness of a solution to the optimality equation, and the existence of the risk probability optimal policy by using the value iteration algorithm. Finally, we provide two examples to illustrate our results, one of which explains and verifies our conditions and the other shows the computational results of the value function and the risk probability optimal policy.

Keywords:

optimal policy, risk probability criterion, piecewise deterministic Markov decision processes, the value iteration algorithm

Classification:

90C40, 60E20

References:

  1. A. Almudevar: A dynamic programming algorithm for the optimal control of piecde-wise deterministic Markov processes. SIAM J. Control Optim. 40 (2001), 525-539.   DOI:10.1137/S0363012999364474
  2. D. Bertsekas and S. Shreve: Stochastic Optimal Control: The Discrete-Time Case. Academic Press Inc, New York 1978.   CrossRef
  3. O. L. V. Costa and F. Dufour: The vanishing discount approach for the average continuous of piecewise deterministic Markov processes. J. Appl. Probab. 46 (2009), 1157-1183.   DOI:10.1017/S0021900200006203
  4. O. L. V. Costa and F. Dufour: Continuous Average Control of Piecewise Deterministic Markov Processes. Springer-Vrelag, New York 2013.   CrossRef
  5. N. Bauerle and U. Rieder: Markov Decision Processes with Applications to Finance. Springer, Heidelberg 2011.   CrossRef
  6. D. Bertsekas and S. Shreve: Stochastic Optimal Control: The Discrete-Time Case. Academic Press Inc, New York 1978.   CrossRef
  7. K. Boda, J. A. Filar and Y. L. Lin: Stochastic target hitting time and the problem of early retirement. IEEE Trans. Automat. Control.49 (2004), 409-419.   DOI:10.1109/TAC.2004.824469
  8. M. H. A. Davis: Piecewise deterministic Markov processes: a general class of nondiffusion stochastic models. J. Roy. Statist. Soc. 46 (1984), 353-388.   DOI:10.1111/j.2517-6161.1984.tb01308.x
  9. M. H. A. Davis: Markov Models and Optimization. Chapman and Hall 1993.   DOI:10.1007/978-1-4899-4483-2
  10. F. Dufou, M. Horiguchi and A. Piunovskiy: Optimal impulsive control of piecewise deterministic Markov processes. Stochastics 88 (2016), 1073-1098.   DOI:10.1080/17442508.2016.1197925
  11. X. P. Guo and O. Hernández-Lerma: Continuous-Time Markov Decision Process: Theorey and Applications. Springer-Verlag, Berlin 2009.   CrossRef
  12. X. P. Guo and A. Piunovskiy: Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Oper. Res. 36 (2011), 105-132.   DOI:10.1287/moor.1100.0477
  13. X. P. Guo, X. Y. Song and Y. Zhang: First passage optimality for continuous time Markov decision processes with varying discount factors and history-dependent policies. IEEE Trans. Automat. Control 59 (2014), 163-174.   DOI:10.1109/TAC.2013.2281475
  14. O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Process: Basic Optimality Criteria. Springer-Verlag, New York 1996.   CrossRef
  15. J. P. Hespanha: A model for stochastic hybrid systems with applications to communication networks. Nonlinear Anal. 62 (2005), 1353-1383.   DOI:10.1016/j.na.2005.01.112
  16. Y. H. Huang and X. P. Guo: Finite-horizon piecewise deterministic Markov decision processes with unbounded transition rates. Stochastics 91 (2019), 67-95.   DOI:10.1080/17442508.2018.1518450
  17. Y. H. Huang, X. P. Guo and Z. F. Li: Minimum risk probability for finite horizon semi-Markov decision process. J. Math. Anal. Appl. 402 (2013), 378-391.   DOI:10.1016/j.jmaa.2013.01.021
  18. X. X. Huang, X. L. Zou and X. P. Guo: A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci. China Math. 58 (2015), 1923-1938.   DOI:10.1007/s11425-015-5029-x
  19. H. F. Huo and X. Wen: First passage risk probability optimality for continuous time Markov decision processes. Kybernetika 55 (2019), 114-133.   DOI:10.14736/kyb-2019-1-0114
  20. H. F. Huo, X. L. Zou and X. P. Guo: The risk probability criterion for discounted continuous-time Markov decision processes. Discrete Event Dynamic system: Theory Appl. 27 (2017), 675-699.   DOI:10.1007/s10626-017-0257-6
  21. J. Janssen and R. Manca: Semi-Markov Risk Models For Finance, Insurance, and Reliability. Springer-Verlag, New York 2006.   CrossRef
  22. Y. L. Lin, R. J. Tomkins and C. L. Wang: Optimal models for the first arrival time distribution function in continuous time with a special case. Acta. Math. Appl. Sinica 10 (1994) 194-212.   DOI:10.1007/BF02006119
  23. Y. Ohtsubo and K. Toyonaga: Optimal policy for minimizing risk models in Markov decision processes. J. Math. Anal. Appl. 271 (2002), 66-81.   DOI:10.1016/s0022-247x(02)00097-5
  24. A. Piunovskiy and Y. Zhang: Continuous-Time Markov Decision Processes: Borel Space Models and General Control Strategies. Springer, 2020.   CrossRef
  25. X. Wen, H. F. Huo and X. P. Guo: First passage risk probability minimization for piecewise deterministic Markov decision processes. Acta Math. Appl. Sinica 38 (2022), 549-567.   DOI:10.1007/s10255-022-1098-0
  26. C. B. Wu and Y. L. Lin: Minimizing risk models in Markov decision processes with policies depending on target values. J. Math. Anal. Appl. 231 (1999), 47-57.   DOI:10.1006/jmaa.1998.6203
  27. X. Wu and X. P. Guo: First passage optimality and variance minimization of Markov decision processes with varying discount factors. J. Appl. Prob. 52 (2015), 441-456.   DOI:10.1017/S0021900200012560