Kybernetika 53 no. 4, 694-716, 2017

Empirical approximation in Markov games under unbounded payoff: discounted and average criteria

Fernando Luque-Vásquez and J. Adolfo Minjárez-SosaDOI: 10.14736/kyb-2017-4-0694

Abstract:

This work deals with a class of discrete-time zero-sum Markov games whose state process $\left\{ x_{t}\right\} $ evolves according to the equation $% x_{t+1}=F(x_{t},a_{t},b_{t},\xi _{t}),$ where $a_{t}$ and $b_{t}$ represent the actions of player 1 and 2, respectively, and $\left\{ \xi _{t}\right\} $ is a sequence of independent and identically distributed random variables with unknown distribution $\theta .$ Assuming possibly unbounded payoff, and using the empirical distribution to estimate $\theta ,$ we introduce approximation schemes for the value of the game as well as for optimal strategies considering both, discounted and average criteria.

Keywords:

Markov games, empirical estimation, discounted and average criteria

Classification:

91A15, 62G07

paper.pdf

References:

H. S. Chang: Perfect information two-person zero-sum Markov games with imprecise transition probabilities. Math. Meth. Oper. Res. 64 (2006), 235-351. DOI:10.1007/s00186-006-0081-5
R. M. Dudley: The speed of mean Glivenko-Cantelli convergence. Ann. Math. Stat. 40 (1969), 40-50. DOI:10.1214/aoms/1177697802
E. B. Dynkin and A. A. Yushkevich: Controlled Markov Processes. Springer-Verlag, New York 1979. DOI:10.1007/978-1-4615-6746-2
E. Fernández-Gaucherand: A note on the Ross-Taylor Theorem. Appl. Math. Comp. 64 (1994), 207-212. DOI:10.1016/0096-3003(94)90064-7
J. Filar and K. Vrieze: Competitive Markov Decision Processes. Springer-Verlag, New York 1997. DOI:10.1007/978-1-4612-4054-9
M. K. Ghosh, D. McDonald and S. Sinha: Zero-sum stochastic games with partial information. J. Optim. Theory Appl. 121 (2004), 99-118. DOI:10.1023/b:jota.0000026133.56615.cf
E. I. Gordienko: Adaptive strategies for certain classes of controlled Markov processes. Theory Probab. Appl. 29 (1985), 504-518. DOI:10.1137/1129064
E. I. Gordienko and O. Hernández-Lerma: Average cost Markov control processes with weighted norms: existence of canonical policies. Appl. Math. 23 (1995), 199-218. CrossRef
E. I. Gordienko and O. Hernández-Lerma: Average cost Markov control processes with weighted norms: value iteration. Appl. Math. 23 (1995), 219-237. CrossRef
O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York 1996. DOI:10.1007/978-1-4612-0729-0
N. Hilgert and J. A. Minjárez-Sosa: Adaptive control of stochastic systems with unknown disturbance distribution: discounted criterion. Math. Meth. Oper. Res. 63 (2006), 443-460. DOI:10.1007/s00186-005-0024-6
A. Jaśkiewicz and A. Nowak: Zero-sum ergodic stochastic games with Feller transition probabilities. SIAM J. Control Optim. 45 (2006), 773-789. DOI:10.1137/s0363012904443257
A. Jaśkiewicz and A. Nowak: Approximation of noncooperative semi-Markov games. J. Optim. Theory Appl. 131 (2006), 115-134. DOI:10.1007/s10957-006-9128-2
A. Krausz and U. Rieder: Markov games with incomplete information. Math. Meth. Oper. Res. 46 (1997), 263-279. DOI:10.1007/bf01217695
J. A. Minjárez-Sosa: Nonparametric adaptive control for discrete-time Markov processes with unbounded costs under average criterion. Appl. Math. (Warsaw) 26 (1999), 267-280. CrossRef
J. A. Minjárez-Sosa and O. Vega-Amaya: Asymptotically optimal strategies for adaptive zero-sum discounted Markov games. SIAM J. Control Optim. 48 (2009), 1405-1421. DOI:10.1137/060651458
J. A. Minjárez-Sosa and O. Vega-Amaya: Optimal strategies for adaptive zero-sum average Markov games. J. Math. Analysis Appl. 402 (2013), 44-56. DOI:10.1016/j.jmaa.2012.12.011
J. A. Minjárez-Sosa and F. Luque-Vásquez: Two person zero-sum semi-Markov games with unknown holding times distribution on one side: discounted payoff criterion. Appl. Math. Optim. 57 (2008), 289-305. DOI:10.1007/s00245-007-9016-7
A. Neyman and S. Sorin: Stochastic Games and Applications. Kluwer, 2003. DOI:10.1007/978-94-010-0189-2
T. Prieto-Rumeau and J. M. Lorenzo: Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion. TOP 23 (2015), 799-836. DOI:10.1007/s11750-014-0354-8
N. Shimkin and A. Shwartz: Asymptotically efficient adaptive strategies in repeated games. Part I: Certainty equivalence strategies. Math. Oper. Res. 20 (1995), 743-767. DOI:10.1287/moor.20.3.743
N. Shimkin and A. Shwartz: Asymptotically efficient adaptive strategies in repeated games. Part II: Asymptotic optimality. Math. Oper. Res. 21 (1996), 487-512. DOI:10.1287/moor.21.2.487
M. Schäl: Conditions for optimality and for the limit of $n$-stage optimal policies to be optimal. Z. Wahrs. Verw. Gerb. 32 (1975), 179-196. DOI:10.1007/bf00532612
R. Ranga Rao: Relations between weak and uniform convergence of measures with applications. Ann. Math. Statist. 33 (1962), 659-680. DOI:10.1214/aoms/1177704588
J. A. E. E. Van Nunen and J. Wessels: A note on dynamic programming with unbounded rewards. Manag. Sci. 24 (1978), 576-580. DOI:10.1287/mnsc.24.5.576

Kybernetika

Journal