Kybernetika 53 no. 4, 694-716, 2017

Empirical approximation in Markov games under unbounded payoff: discounted and average criteria

Fernando Luque-Vásquez and J. Adolfo Minjárez-SosaDOI: 10.14736/kyb-2017-4-0694


This work deals with a class of discrete-time zero-sum Markov games whose state process $\left\{ x_{t}\right\} $ evolves according to the equation $ x_{t+1}=F(x_{t},a_{t},b_{t},\xi _{t}),$ where $a_{t}$ and $b_{t}$ represent the actions of player 1 and 2, respectively, and $\left\{ \xi _{t}\right\} $ is a sequence of independent and identically distributed random variables with unknown distribution $\theta .$ Assuming possibly unbounded payoff, and using the empirical distribution to estimate $\theta ,$ we introduce approximation schemes for the value of the game as well as for optimal strategies considering both, discounted and average criteria.


Markov games, empirical estimation, discounted and average criteria


91A15, 62G07


  1. H. S. Chang: Perfect information two-person zero-sum Markov games with imprecise transition probabilities. Math. Meth. Oper. Res. 64 (2006), 235-351.   DOI:10.1007/s00186-006-0081-5
  2. R. M. Dudley: The speed of mean Glivenko-Cantelli convergence. Ann. Math. Stat. 40 (1969), 40-50.   DOI:10.1214/aoms/1177697802
  3. E. B. Dynkin and A. A. Yushkevich: Controlled Markov Processes. Springer-Verlag, New York 1979.   DOI:10.1007/978-1-4615-6746-2
  4. E. Fernández-Gaucherand: A note on the Ross-Taylor Theorem. Appl. Math. Comp. 64 (1994), 207-212.   DOI:10.1016/0096-3003(94)90064-7
  5. J. Filar and K. Vrieze: Competitive Markov Decision Processes. Springer-Verlag, New York 1997.   DOI:10.1007/978-1-4612-4054-9
  6. M. K. Ghosh, D. McDonald and S. Sinha: Zero-sum stochastic games with partial information. J. Optim. Theory Appl. 121 (2004), 99-118.   DOI:10.1023/
  7. E. I. Gordienko: Adaptive strategies for certain classes of controlled Markov processes. Theory Probab. Appl. 29 (1985), 504-518.   DOI:10.1137/1129064
  8. E. I. Gordienko and O. Hernández-Lerma: Average cost Markov control processes with weighted norms: existence of canonical policies. Appl. Math. 23 (1995), 199-218.   CrossRef
  9. E. I. Gordienko and O. Hernández-Lerma: Average cost Markov control processes with weighted norms: value iteration. Appl. Math. 23 (1995), 219-237.   CrossRef
  10. O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York 1996.   DOI:10.1007/978-1-4612-0729-0
  11. N. Hilgert and J. A. Minjárez-Sosa: Adaptive control of stochastic systems with unknown disturbance distribution: discounted criterion. Math. Meth. Oper. Res. 63 (2006), 443-460.   DOI:10.1007/s00186-005-0024-6
  12. A. Jaśkiewicz and A. Nowak: Zero-sum ergodic stochastic games with Feller transition probabilities. SIAM J. Control Optim. 45 (2006), 773-789.   DOI:10.1137/s0363012904443257
  13. A. Jaśkiewicz and A. Nowak: Approximation of noncooperative semi-Markov games. J. Optim. Theory Appl. 131 (2006), 115-134.   DOI:10.1007/s10957-006-9128-2
  14. A. Krausz and U. Rieder: Markov games with incomplete information. Math. Meth. Oper. Res. 46 (1997), 263-279.   DOI:10.1007/bf01217695
  15. J. A. Minjárez-Sosa: Nonparametric adaptive control for discrete-time Markov processes with unbounded costs under average criterion. Appl. Math. (Warsaw) 26 (1999), 267-280.   CrossRef
  16. J. A. Minjárez-Sosa and O. Vega-Amaya: Asymptotically optimal strategies for adaptive zero-sum discounted Markov games. SIAM J. Control Optim. 48 (2009), 1405-1421.   DOI:10.1137/060651458
  17. J. A. Minjárez-Sosa and O. Vega-Amaya: Optimal strategies for adaptive zero-sum average Markov games. J. Math. Analysis Appl. 402 (2013), 44-56.   DOI:10.1016/j.jmaa.2012.12.011
  18. J. A. Minjárez-Sosa and F. Luque-Vásquez: Two person zero-sum semi-Markov games with unknown holding times distribution on one side: discounted payoff criterion. Appl. Math. Optim. 57 (2008), 289-305.   DOI:10.1007/s00245-007-9016-7
  19. A. Neyman and S. Sorin: Stochastic Games and Applications. Kluwer, 2003.   DOI:10.1007/978-94-010-0189-2
  20. T. Prieto-Rumeau and J. M. Lorenzo: Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion. TOP 23 (2015), 799-836.   DOI:10.1007/s11750-014-0354-8
  21. N. Shimkin and A. Shwartz: Asymptotically efficient adaptive strategies in repeated games. Part I: Certainty equivalence strategies. Math. Oper. Res. 20 (1995), 743-767.   DOI:10.1287/moor.20.3.743
  22. N. Shimkin and A. Shwartz: Asymptotically efficient adaptive strategies in repeated games. Part II: Asymptotic optimality. Math. Oper. Res. 21 (1996), 487-512.   DOI:10.1287/moor.21.2.487
  23. M. Schäl: Conditions for optimality and for the limit of $n$-stage optimal policies to be optimal. Z. Wahrs. Verw. Gerb. 32 (1975), 179-196.   DOI:10.1007/bf00532612
  24. R. Ranga Rao: Relations between weak and uniform convergence of measures with applications. Ann. Math. Statist. 33 (1962), 659-680.   DOI:10.1214/aoms/1177704588
  25. J. A. E. E. Van Nunen and J. Wessels: A note on dynamic programming with unbounded rewards. Manag. Sci. 24 (1978), 576-580.   DOI:10.1287/mnsc.24.5.576