Kybernetika 53 no. 4, 694-716, 2017

Empirical approximation in Markov games under unbounded payoff: discounted and average criteria

Fernando Luque-Vásquez and J. Adolfo Minjárez-SosaDOI: 10.14736/kyb-2017-4-0694

Abstract:

This work deals with a class of discrete-time zero-sum Markov games whose state process $\left\{ x_{t}\right\} $ evolves according to the equation $ x_{t+1}=F(x_{t},a_{t},b_{t},\xi _{t}),$ where $a_{t}$ and $b_{t}$ represent the actions of player 1 and 2, respectively, and $\left\{ \xi _{t}\right\} $ is a sequence of independent and identically distributed random variables with unknown distribution $\theta .$ Assuming possibly unbounded payoff, and using the empirical distribution to estimate $\theta ,$ we introduce approximation schemes for the value of the game as well as for optimal strategies considering both, discounted and average criteria.

Keywords:

Markov games, empirical estimation, discounted and average criteria

Classification:

91A15, 62G07

References:

  1. H. S. Chang: Perfect information two-person zero-sum Markov games with imprecise transition probabilities. Math. Meth. Oper. Res. 64 (2006), 235-351.   DOI:10.1007/s00186-006-0081-5
  2. R. M. Dudley: The speed of mean Glivenko-Cantelli convergence. Ann. Math. Stat. 40 (1969), 40-50.   DOI:10.1214/aoms/1177697802
  3. E. B. Dynkin and A. A. Yushkevich: Controlled Markov Processes. Springer-Verlag, New York 1979.   DOI:10.1007/978-1-4615-6746-2
  4. E. Fernández-Gaucherand: A note on the Ross-Taylor Theorem. Appl. Math. Comp. 64 (1994), 207-212.   DOI:10.1016/0096-3003(94)90064-7
  5. J. Filar and K. Vrieze: Competitive Markov Decision Processes. Springer-Verlag, New York 1997.   DOI:10.1007/978-1-4612-4054-9
  6. M. K. Ghosh, D. McDonald and S. Sinha: Zero-sum stochastic games with partial information. J. Optim. Theory Appl. 121 (2004), 99-118.   DOI:10.1023/b:jota.0000026133.56615.cf
  7. E. I. Gordienko: Adaptive strategies for certain classes of controlled Markov processes. Theory Probab. Appl. 29 (1985), 504-518.   DOI:10.1137/1129064
  8. E. I. Gordienko and O. Hernández-Lerma: Average cost Markov control processes with weighted norms: existence of canonical policies. Appl. Math. 23 (1995), 199-218.   CrossRef
  9. E. I. Gordienko and O. Hernández-Lerma: Average cost Markov control processes with weighted norms: value iteration. Appl. Math. 23 (1995), 219-237.   CrossRef
  10. O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York 1996.   DOI:10.1007/978-1-4612-0729-0
  11. N. Hilgert and J. A. Minjárez-Sosa: Adaptive control of stochastic systems with unknown disturbance distribution: discounted criterion. Math. Meth. Oper. Res. 63 (2006), 443-460.   DOI:10.1007/s00186-005-0024-6
  12. A. Jaśkiewicz and A. Nowak: Zero-sum ergodic stochastic games with Feller transition probabilities. SIAM J. Control Optim. 45 (2006), 773-789.   DOI:10.1137/s0363012904443257
  13. A. Jaśkiewicz and A. Nowak: Approximation of noncooperative semi-Markov games. J. Optim. Theory Appl. 131 (2006), 115-134.   DOI:10.1007/s10957-006-9128-2
  14. A. Krausz and U. Rieder: Markov games with incomplete information. Math. Meth. Oper. Res. 46 (1997), 263-279.   DOI:10.1007/bf01217695
  15. J. A. Minjárez-Sosa: Nonparametric adaptive control for discrete-time Markov processes with unbounded costs under average criterion. Appl. Math. (Warsaw) 26 (1999), 267-280.   CrossRef
  16. J. A. Minjárez-Sosa and O. Vega-Amaya: Asymptotically optimal strategies for adaptive zero-sum discounted Markov games. SIAM J. Control Optim. 48 (2009), 1405-1421.   DOI:10.1137/060651458
  17. J. A. Minjárez-Sosa and O. Vega-Amaya: Optimal strategies for adaptive zero-sum average Markov games. J. Math. Analysis Appl. 402 (2013), 44-56.   DOI:10.1016/j.jmaa.2012.12.011
  18. J. A. Minjárez-Sosa and F. Luque-Vásquez: Two person zero-sum semi-Markov games with unknown holding times distribution on one side: discounted payoff criterion. Appl. Math. Optim. 57 (2008), 289-305.   DOI:10.1007/s00245-007-9016-7
  19. A. Neyman and S. Sorin: Stochastic Games and Applications. Kluwer, 2003.   DOI:10.1007/978-94-010-0189-2
  20. T. Prieto-Rumeau and J. M. Lorenzo: Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion. TOP 23 (2015), 799-836.   DOI:10.1007/s11750-014-0354-8
  21. N. Shimkin and A. Shwartz: Asymptotically efficient adaptive strategies in repeated games. Part I: Certainty equivalence strategies. Math. Oper. Res. 20 (1995), 743-767.   DOI:10.1287/moor.20.3.743
  22. N. Shimkin and A. Shwartz: Asymptotically efficient adaptive strategies in repeated games. Part II: Asymptotic optimality. Math. Oper. Res. 21 (1996), 487-512.   DOI:10.1287/moor.21.2.487
  23. M. Schäl: Conditions for optimality and for the limit of $n$-stage optimal policies to be optimal. Z. Wahrs. Verw. Gerb. 32 (1975), 179-196.   DOI:10.1007/bf00532612
  24. R. Ranga Rao: Relations between weak and uniform convergence of measures with applications. Ann. Math. Statist. 33 (1962), 659-680.   DOI:10.1214/aoms/1177704588
  25. J. A. E. E. Van Nunen and J. Wessels: A note on dynamic programming with unbounded rewards. Manag. Sci. 24 (1978), 576-580.   DOI:10.1287/mnsc.24.5.576