Kybernetika 45 no. 5, 755-767, 2009

A Stopping Rule for Discounted Markov Decision Processes with Finite Action Sets

Raúl Montes-de-Oca, Daniel Cruz-Suárez and Enrique Lemus-Rodríguez

Abstract:

In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i.e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound.

Keywords:

discounted cost, Markov decision process, ergodicity condition, value iteration, optimal policy, myopic policies

Classification:

90C40, 93E20