Kybernetika 53 no. 1, 59-81, 2017

Mean-variance optimality for semi-Markov decision processes under first passage criteria

Xiangxiang Huang and Yonghui HuangDOI: 10.14736/kyb-2017-1-0059

Abstract:

This paper deals with a first passage mean-variance problem for semi-Markov decision processes in Borel spaces. The goal is to minimize the variance of a total discounted reward up to the system's first entry to some target set, where the optimization is over a class of policies with a prescribed expected first passage reward. The reward rates are assumed to be possibly unbounded, while the discount factor may vary with states of the system and controls. We first develop some suitable conditions for the existence of first passage mean-variance optimal policies and provide a policy improvement algorithm for computing an optimal policy. Then, two examples are included to illustrate our results. At last, we show how the results here are reduced to the cases of discrete-time Markov decision processes and continuous-time Markov decision processes.

Keywords:

first passage time, semi-Markov decision processes, unbounded reward rate, minimal variance, mean-variance optimal policy

Classification:

90C40, 60J27

References:

  1. H. Berument, Z. Kilinc and U. Ozlale: The effects of different inflation risk premiums on interest rate spreads. Phys. A 333 (2004), 317-324.   DOI:10.1016/j.physa.2003.10.039
  2. M. Baykal-Gürsoy and K. Gürsoy: Semi-Markov decision processes: nonstandard criteria. Probab. Engrg. Inform. Sci. 21 (2007), 635-657.   CrossRef
  3. N. Bäuerle and U. Rieder: Markov decision processes with applications to finance. In: Universitext, Springer, Heidelberg 2011.   DOI:10.1007/978-3-642-18324-9
  4. E. Collins: Finite-horizon variance penalised Markov decision processes. OR Spektrum 19 (1997), 35-39.   DOI:10.1007/s002910050017
  5. O. L. V. Costa, A. C. Maiali and A. de C. Pinto: Sampled control for mean-variance hedging in a jump diffusion financial market. IEEE Trans. Automat. Control 55 (2010), 1704-1709.   DOI:10.1109/tac.2010.2046923
  6. J. A. Filar, L. C. M. Kallenberg and H. M. Lee: Variance-penalized Markov decision processes. Math. Oper. Res. 14 (1989), 147-161.   DOI:10.1287/moor.14.1.147
  7. C. P. Fu, A. Lari-Lavassani and X. Li: Dynamic mean-variance portfolio selection with borrowing constraint. European J. Oper. Res. 200 (2010), 312-319.   DOI:10.1016/j.ejor.2009.01.005
  8. X. P. Guo and O. Hernández-Lerma: Continuous-Time Markov Decision Processes: Theory and Applications. Springer-Verlag, Berlin 2009.   DOI:10.1007/978-3-642-02547-1
  9. X. P. Guo and X. Y. Song: Mean-variance criteria for finite continuous-time Markov decision processes. IEEE Trans. Automat. Control 54 (2009), 2151-2157.   DOI:10.1109/tac.2009.2023833
  10. X. P. Guo, L. E. Ye and G. Yin: A mean-variance optimization problem for discounted Markov decision processes. European J. Oper. Res. 220 (2012), 423-429.   CrossRef
  11. X. P. Guo, X. X. Huang and Y. Zhang: On the first passage $g$-mean variance optimality for discounted continuous-time Markov decision processes. SIAM J. Control Optim. 53 (2015), 1406-1424.   DOI:10.1137/140968872
  12. Q. Y. Hu: Continuous time Markov decision processes with discounted moment criterion. J. Math. Anal. Appl. 203 (1996), 1-12.   DOI:10.1006/jmaa.1996.9999
  13. O. Hernández-Lerma and J. B. Lasserre: Further Topics on Discrete-Time Markov Control Processes. Springer-Verlag, New York 1999.   DOI:10.1007/978-1-4612-0561-6
  14. O. Hernández-Lerma, O. Vega-Amaya and G. Carrasco: Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optim. 38 (1999), 79-93.   CrossRef
  15. S. Haberman and J. H. Sung: Optimal pension funding dynamics over infinite control horizon when stochastic rates of return are stationary. Insurance Math. Econom. 36 (2005), 103-116.   DOI:10.1016/j.insmatheco.2004.10.006
  16. Y. H. Huang and X. P. Guo: First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta Math. Appl. Sin. Engl. Ser. 27 (2011), 177-190.   DOI:10.1007/s10255-011-0061-2
  17. Y. H. Huang, X. P. Guo and X. Y. Song: Performance analysis for controlled semi-Markov systems with application to maintenance. J. Optim. Theory Appl. 150 (2011), 395-415.   DOI:10.1007/s10957-011-9813-7
  18. Y. H. Huang and X. P. Guo: Constrained optimality for first passage criteria in semi-Markov decision processes. Optimization, Control, and Applications of Stochastic Systems, pp. 181-202, Systems Control Found. Appl., Birkhäuser/Springer, New York 2012.   CrossRef
  19. Y. H. Huang and X. P. Guo: Mean-variance problems for finite horizon semi-Markov decision processes. Appl. Math. Optim. 72 (2015), 233-259.   DOI:10.1007/s00245-014-9278-9
  20. S. C. Jaquette: Markov decision processes with a new optimality criterion: continuous time. Ann. Statist. 3 (1975), 547-553.   DOI:10.1214/aos/1176343087
  21. M. Kurano: Markov decision processes with a minimum-variance criterion. J. Math. Anal. Appl. 123 (1987), 572-583.   DOI:10.1016/0022-247x(87)90332-5
  22. I. Kharroubi and T. Lim: A. Ngoupeyou, Mean-variance hedging on uncertain time horizon in a market with a jump. Appl. Math. Optim. 68 (2013), 413-444.   DOI:10.1007/s00245-013-9213-5
  23. M. J. Lee and W. J. Li: Drift and diffusion function specification for short-term interest rates. Econom. Lett. 86 (2005), 339-346.   DOI:10.1016/j.econlet.2004.09.002
  24. P. Mandl: On the variance in controlled Markov chains. Kybernetika 7 (1971), 1-12.   CrossRef
  25. S. Mannor and J. N. Tsitsiklis: Algorithmic aspects of mean-variance optimization in Markov decision processes. European J. Oper. Res. 231 (2013), 645-653.   DOI:10.1016/j.ejor.2013.06.019
  26. H. M. Markowitz: Portfolio Selection: Efficient Diversification of Investments. John Wiley and Sons, Inc., New York 1959.   CrossRef
  27. T. Prieto-Rumeau and O. Hernández-Lerma: Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains. Math. Methods Oper. Res. 70 (2009), 527-540.   DOI:10.1007/s00186-008-0276-z
  28. M. J. Sobel: The variance of discounted Markov decision processes. J. Appl. Probab. 19 (1982), 794-802.   DOI:10.1017/s0021900200023123
  29. D. J. White: Computational approaches to variance-penalised Markov decision processes. OR Spektrum 14 (1992), 79-83.   DOI:10.1007/bf01720350
  30. X. Wu and X. P. Guo: First passage optimality and variance minimisation of Markov decision processes with varying discount factors. J. Appl. Probab. 52 (2015), 441-456.   DOI:10.1017/s0021900200012560
  31. X. Y. Zhou and G. Yin: Markowitz's mean-variance portfolio selection with regime switching: a continuous-time model. SIAM J. Control Optim. 42 (2003), 1466-1482.   DOI:10.1137/s0363012902405583
  32. Q. X. Zhu and X. P. Guo: Markov decision processes with variance minimization: a new condition and approach. Stoch. Anal. Appl. 25 (2007), 577-592.   DOI:10.1080/07362990701282807