Kybernetika 48 no. 1, 83-104, 2012

An optimality system for finite average Markov decision chains under risk-aversion

Alfredo Alaní s-Durán and Rolando Cavazos-Cadena


This work concerns controlled Markov chains with finite state space and compact action sets. The decision maker is risk-averse with constant risk-sensitivity, and the performance of a control policy is measured by the long-run average cost criterion. Under standard continuity-compactness conditions, it is shown that the (possibly non-constant) optimal value function is characterized by a system of optimality equations which allows to obtain an optimal stationary policy. Also, it is shown that the optimal superior and inferior limit average cost functions coincide.


partition of the state space, nonconstant optimal average cost, equality of superior and inferior limit risk-averse average criteria


93E20, 60J05, 93C55


  1. A. Arapstathis, V. K. Borkar, E. Fernández-Gaucherand, M. K. Gosh and S. I. Marcus: Discrete-time controlled Markov processes with average cost criteria: a survey. SIAM J. Control Optim. 31 (1993), 282-334.   CrossRef
  2. P. Billingsley: Probability and Measure. Third edition. Wiley, New York 1995.   CrossRef
  3. R. Cavazos-Cadena and E. Fernández-Gaucherand: Controlled Markov chains with risk-sensitive criteria: average cost, optimality equations and optimal solutions. {Math. Method Optim. Res.} 43 (1999), 121-139.   CrossRef
  4. R. Cavazos-Cadena and E. Fernández-Gaucherand: Risk-sensitive control in communicating average Markov decision chains. In: { Modelling Uncertainty: An examination of Stochastic Theory, Methods and Applications} (M. Dror, P. L'Ecuyer and F. Szidarovsky, eds.), Kluwer, Boston 2002, pp. 525-544.   CrossRef
  5. R. Cavazos-Cadena: Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space. {Math. Method Optim. Res.} 57 (2003), 263-285.   CrossRef
  6. R. Cavazos-Cadena and D. Hernández-Hernández: A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains. {Ann. App. Probab.}, 15 (2005), 175-212.   CrossRef
  7. R. Cavazos-Cadena and D. Hernández-Hernández: A system of Poisson equations for a non-constant Varadhan functional on a finite state space. {Appl. Math. Optim.} 53 (2006), 101-119.   CrossRef
  8. R. Cavazos-Cadena and F. Salem-Silva: The discounted method and equivalence of average criteria for risk-sensitive Markov decision processes on Borel spaces. { Appl. Math. Optim.} 61 (2009), 167-190.   CrossRef
  9. G. B. Di Masi and L. Stettner: Risk-sensitive control of discrete time Markov processes with infinite horizon. {SIAM J. Control Optim.} 38 1999, 61-78.   CrossRef
  10. G. B. Di Masi and L. Stettner: Infinite horizon risk sensitive control of discrete time Markov processes with small risk. {Syst. Control Lett.} 40 (2000), 15-20.   CrossRef
  11. G. B. Di Masi and L. Stettner: Infinite horizon risk sensitive control of discrete time Markov processes under minorization property. {SIAM J. Control Optim.} 46 (2007), 231-252.   CrossRef
  12. W. H. Fleming and W. M. McEneany: Risk-sensitive control on an infinite horizon. {SIAM J. Control Optim.} 33 (1995), 1881-1915.   CrossRef
  13. F. R. Gantmakher: The Theory of Matrices. {Chelsea}, London 1959.   CrossRef
  14. D. Hernández-Hernández and S. I. Marcus: Risk-sensitive control of Markov processes in countable state space. {Syst. Control Lett.} 29 (1996), 147-155.   CrossRef
  15. D. Hernández-Hernández and S. I. Marcus: Existence of risk sensitive optimal stationary policies for controlled Markov processes. {Appl. Math. Optim.} 40 (1999), 273-285.   CrossRef
  16. A. R. Howard and J. E. Matheson: Risk-sensitive Markov decision processes. {Management Sci.} 18 (1972), 356-369.   CrossRef
  17. D. H. Jacobson: Optimal stochastic linear systems with exponential performance criteria and their relation to stochastic differential games. {IEEE Trans. Automat. Control} 18 (1973), 124-131.   CrossRef
  18. S. C. Jaquette: Markov decison processes with a new optimality criterion: discrete time. {Ann. Statist.} 1 (1973), 496-505.   CrossRef
  19. S. C. Jaquette: A utility criterion for Markov decision processes. {Management Sci.} 23 (1976), 43-49.   CrossRef
  20. A. Jaśkiewicz: Average optimality for risk sensitive control with general state space. {Ann. App. Probab.} 17 (2007), 654-675.   CrossRef
  21. U. G. Rothblum and P. Whittle: Growth optimality for branching Markov decision chains. {Math. Oper. Res.} 7 (1982), 582-601.   CrossRef
  22. K. Sladký: Successive approximation methods for dynamic programming models. In: Proc. Third Formator Symposium on the Analysis of Large-Scale Systems (J. Bene\u s and L. Bakule, eds.), Academia, Prague 1979, pp. 171-189.   CrossRef
  23. K. Sladký: Bounds on discrete dynamic programming recursions I. {Kybernetika} 16 (1980), 526-547.   CrossRef
  24. K. Sladký: Growth rates and average optimality in risk-sensitive Markov decision chains. {Kybernetika} 44 (2008), 205-226.   CrossRef
  25. K. Sladký and R. Montes-de-Oca: Risk-sensitive average optimality in Markov decision chains. In: Operations Research Proceedings, Vol. 2007, Part III (2008), pp. 69-74.   CrossRef
  26. P. Whittle: Optimization Over Time-Dynamic Programming and Stochastic Control. Wiley, Chichester 1983.   CrossRef
  27. W. H. M. Zijm: Nonnegative Matrices in Dynamic Programming. Mathematical Centre Tract, Amsterdam 1983.   CrossRef