Kybernetika 52 no. 1, 66-75, 2016

Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach

This article was granted Editor's award of the year 2016Editor's award 2016

R. Israel Ortega-Gutiérrez, Raúl Montes-de-Oca and Enrique Lemus-RodríguezDOI: 10.14736/kyb-2016-1-0066

Abstract:

Many examples in optimization, ranging from Linear Programming to Markov Decision Processes (MDPs), present more than one optimal solution. The study of this non-uniqueness is of great mathematical interest. In this paper the authors show that in a specific family of discounted MDPs, non-uniqueness is a "fragile'' property through Ekeland's Principle for each problem with at least two optimal policies; a perturbed model is produced with a unique optimal policy. This result not only supersedes previous papers on the subject, but it also renews the interest in the corresponding questions of well-posedness, genericity and structural stability of MDPs.

Keywords:

discounted Markov decision processes, dynamic programming, unique optimal policy, non-uniqueness of optimal policies, Ekeland's variational principle

Classification:

90C40, 93E20

References:

  1. D. P. Bertsekas: Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, NJ 1987.   CrossRef
  2. E. Bishop and R. R. Phelps: The support functionals of a convex set. In: Proc. Sympos. Pure Math. Vol. VII, 1963 (V. L. Klee, ed.), Amer. Math. Soc., pp. 27-35.   DOI:10.1090/pspum/007/0154092
  3. J. M. Borwein and Q. J. Zhu: Techniques of Variational Analysis. Springer, New York 2005.   CrossRef
  4. D. Cruz-Suárez, R. Montes-de-Oca and F. Salem-Silva: Conditions for the uniqueness of optimal policies of discounted Markov decision processes. Math. Methods Oper. Res. 60 (2004), 415-436.   DOI:10.1007/s001860400372
  5. D. Cruz-Suárez and R. Montes-de-Oca: Uniform convergence of the value iteration policies for discounted Markov decision processes. Bol. Soc. Mat. Mexicana 12 (2006), 133-152.   CrossRef
  6. I. Ekeland: On the variational principle. J. Math. Anal. Appl. 67 (1974), 324-353.   DOI:10.1016/0022-247x(74)90025-0
  7. O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York 1996.   DOI:10.1007/978-1-4612-0729-0
  8. R. Lucchetti: Convexity and Well-Posed Problems. CMS Books in Mathematics, Springer, New York 2006.   DOI:10.1007/0-387-31082-7
  9. R. Montes-de-Oca and E. Lemus-Rodríguez: An unbounded Berge's minimum theorem with applications to discounted Markov decision processes. Kybernetika 48 (2012), 268-286.   CrossRef
  10. R. Montes-de-Oca, E. Lemus-Rodríguez and F. Salem-Silva: Nonuniqueness versus uniqueness of optimal policies in convex discounted Markov decision processes. J. Appl. Math. 2013 (2013), 1-5.   CrossRef
  11. R. T. Rockafellar and R. J. B. Wets: Variational Analysis. Springer, New York 2004.   CrossRef
  12. K. Tanaka, M. Hosino and D. Kuroiwa: On an $\varepsilon $-optimal policy of discrete time stochastic control processes. Bull. Inform. Cybernet. 27 (1995), 107-119.   CrossRef