Kybernetika 49 no. 5, 705-719, 2013

Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation

Abstract:

In this paper there are considered Markov decision processes (MDPs) that have the discounted cost as the objective function, state and decision spaces that are subsets of the real line but are not necessarily finite or denumerable. The considered MDPs have a cost function that is possibly unbounded, and dynamic independent of the current state. The considered decision sets are possibly non-compact. In the context described, conditions to obtain either an increasing or decreasing optimal stationary policy are provided; these conditions do not require assumptions of convexity. Versions of the policy iteration algorithm (PIA) to approximate increasing or decreasing optimal stationary policies are detailed. An illustrative example is presented. Finally, comments on the monotonicity conditions and the monotone versions of the PIA that are applied to discounted MDPs with rewards are given.

Keywords:

total discounted cost, Markov decision process, total discounted reward, increasing optimal policy, decreasing optimal policy, policy iteration algorithm

Classification:

90C40, 93E20

paper.pdf

References:

D. Assaf: Invariant problems in discounted dynamic programming. Adv. in Appl. Probab. 10 (1978), 472-490. CrossRef
N. Bäuerle and U. Rieder: Markov Decision Processes with Applications to Finance. Springer-Verlag, Berlin - Heidelberg 2011. CrossRef
D. P. Bertsekas: Dynamic Programming: Deterministic and Stochastic Models. Prentice Hall, New Jersey 1987. CrossRef
D. Cruz-Suárez, R. Montes-de-Oca and F. Salem-Silva: Conditions for the uniqueness of optimal policies of discounted Markov decision processes. Math. Methods Oper. Res. 60 (2004), 415-436. CrossRef
A. Dragut: Structured optimal policies for Markov decision processes: lattice programming techniques. In: Wiley Encyclopedia of Operations Research and Management Scien\-ce (J. J. Cochran, ed.), John Wiley and Sons, 2010, pp. 1-25. CrossRef
D. Duffie: Security Markets. Academic Press, San Diego 1988. CrossRef
R. M. Flores-Hernández and R. Montes-de-Oca: Monotonicity of minimizers in optimization problems with applications to Markov control processes. Kybernetika 43 (2007), 347-368. CrossRef
O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York 1996. CrossRef
D. P. Heyman and M. J. Sobel: Stochastic Models in Operations Research, Vol. II. Stochastic Optimization. McGraw-Hill, New York 1984. CrossRef
A. Jaśkiewicz: A note on risk-sensitive control of invariant models. Syst. Control Lett. 56 (2007), 663-668. CrossRef
A. Jaśkiewicz and A. S. Nowak: Discounted dynamic programming with unbounded returns: application to economic models. J. Math. Anal. Appl. 378 (2011), 450-462. CrossRef
R. Mendelssohn and M. J. Sobel: Capital accumulation and the optimization of renewable resource models. J. Econom. Theory 23 (1980), 243-260. CrossRef
M. L. Puterman: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, New York 1994. CrossRef
D. M. Topkis: Supermodularity and Complementarity. Princeton University Press, Princeton, New Jersey 1998. CrossRef

Kybernetika

Journal