Kybernetika 58 no. 3, 301-319, 2022

The exponential cost optimality for finite horizon semi-Markov decision processes

Haifeng Huo and Xian WenDOI: 10.14736/kyb-2022-3-0301


This paper considers an exponential cost optimality problem for finite horizon semi-Markov decision processes (SMDPs). The objective is to calculate an optimal policy with minimal exponential costs over the full set of policies in a finite horizon. First, under the standard regular and compact-continuity conditions, we establish the optimality equation, prove that the value function is the unique solution of the optimality equation and the existence of an optimal policy by using the minimum nonnegative solution approach. Second, we establish a new value iteration algorithm to calculate both the value function and the $\epsilon$-optimal policy. Finally, we give a computable machine maintenance system to illustrate the convergence of the algorithm.


optimal policy, semi-Markov decision processes, finite horizon, exponential cost, optimality equation


90C40, 60E20


  1. D. P. Bertsekas and S. E. Shreve: Stochastic Optimal Control: The Discrete-Time Case. Academic Press, Inc. 1978.   CrossRef
  2. N. Baüuerle and U. Rieder: Markov Decision Processes with Applications to Finance. Springer, Heidelberg 2011   CrossRef
  3. N. Baüerle and U. Rieder: More risk-sensitive Markov decision processes. Math. Oper. Res. 39 (2014), 105-120.   DOI:10.1287/moor.2013.0601
  4. X. R. Cao: Semi-Markov decision problems and performance sensitivity analysis. IEEE Trans. Automat. Control 48 (2003), 758-769.   DOI:10.1109/TAC.2003.811252
  5. R. Cavazos-Cadena and R. Montes-De-Oca: Optimal stationary policies in risk-sensitive dynamic programs with finite state space and nonnegative rewards. Appl. Math. 27 (2000), 167-185.   DOI:10.4064/am-27-2-167-185
  6. R. Cavazos-Cadena and R. Montes-De-Oca: Nearly optimal policies in risk-sensitive positive dynamic programming on discrete spaces. Math. Methl Oper. Res. 52 (2000), 133-167.   DOI:10.1155/S107379280000009X
  7. S. Chávez-Rodríguez, R. Cavazos-Cadena and H. Cruz-Suárez: Controlled Semi-Markov chains with risk-sensitive average cost criterion. J. Optim. Theory Appl. 170 (2016), 670-686.   DOI:10.1007/s10957-016-0916-z
  8. K. J. Chung and M. J. Sobel: Discounted MDP's: distribution functions and exponential utility maximization. SIAM J. Control Optim. 25 (1987), 49-62.   DOI:10.1137/0325004
  9. M. K. Ghosh and S. Saha: Risk-sensitive control of continuous time Markov chains. Stoch. Int. J. Probab. Stoch. Process. 86 (2014), 655-675.   DOI:10.1080/17442508.2013.872644
  10. X. P. Guo and O. Hernández-Lerma: Continuous-Time Markov Decision Process: Theorey and Applications. Springer-Verlag, Berlin 2009.   CrossRef
  11. O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov control process: Basic Optimality Criteria. Springer-Verlag, New York 1996.   CrossRef
  12. R. A. Howard and J. E. Matheson: Risk-sensitive Markov decision processes. Management Sci. 18 (1972), 356-369.   DOI:10.1287/mnsc.18.7.356
  13. Y. H. Huang, Z. T. Lian and X. P. Guo: Risk-sensitive semi-Markov decision processes with general utilities and multiple criteria. Adv. Appl. Probab. 50 (2018), 783-804.   DOI:10.1017/apr.2018.36
  14. Y. H. Huang and X. P. Guo: Finite horizon semi-Markov decision processes with application to maintenance systems. Europ. J. Oper. Res. 212 (2011), 131-140.   DOI:10.1016/j.ejor.2011.01.027
  15. X. X. Huang, X. L. Zou and X. P. Guo: A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci. China Math. 58 (2015), 1923-1938.   DOI:10.1007/s11425-015-5029-x
  16. H. F. Huo and X. Wen: First passage risk probability optimality for continuous time Markov decision processes. Kybernetika 55 (2019), 114-133.   DOI:10.14736/kyb-2019-1-0114
  17. A. Jaśkiewicz: A note on negative dynamic programming for risk-sensitive control. Oper. Res. Lett. 36 (2008), 531-534.   DOI:10.1016/j.mpmed.2008.07.009
  18. J. Janssen and R. Manca: Semi-Markov Risk Models For Finance, Insurance, and Reliability. Springer, New York 2006.   CrossRef
  19. A. Jaśkiewicz: On the equivalence of two expected average cost criteria for semi Markov control processes. Math. Oper. Res. 29 (2013), 326-338.   DOI:10.1002/dmrr.2420
  20. S. C. Jaquette: A utility criterion for Markov decision processes. Manag Sci. {\mi23} (1976), 43-49.   DOI:10.5951/AT.23.1.0043
  21. F. Luque-Vasquez and J. A. Minjarez-Sosa: Semi-Markov control processes with unknown holding times distribution under a discounted criterion. Math. Methods Oper. Res. 61 (2005), 455-468.   DOI:10.1007/s001860400406
  22. J. W. Mamer: Successive approximations for finite horizon semi-Markov decision processes with application to asset liquidation. Oper. Res. 34 (1986), 638-644.   DOI:10.1287/opre.34.4.638
  23. V. Nollau: Solution of a discounted semi-markovian descision problem by successiveoevarrelaxation. Optimization. 39, (1997), 85-97.   DOI:10.1080/02331939708844273
  24. M. L. Puterman: Markov Decision Processes: Discrete Stochastic Dynamic Programming    CrossRef
  25. Q. Wei: Continuous-time Markov decision processes with risk-sensitive finite-horizon cost criterion. Math. Oper. Res. 84 (2016), 461-487.   DOI:10.1007/s00186-016-0550-4
  26. X. Wu and X. P. Guo: First passage optimality and variance minimization of Markov decision processes with varying discount factors. J. Appl. Prob. 52 (2015), 441-456.   DOI:10.1017/S0021900200012560
  27. A. A. Yushkevich: On semi-Markov controlled models with average reward criterion. Theory Probab. Appl. 26 (1982), 808-815.   DOI:10.1137/1126089
  28. Y. Zhang: Continuous-time Markov decision processes with exponential utility. SIAM J. Control Optim. 55 (2017), 2636-2666.   DOI:10.1137/16m1086261