Kybernetika 58 no. 2, 180-199, 2022

Markov decision processes on finite spaces with fuzzy total rewards

Karla Carrero-Vera, Hugo Cruz-Suárez and Raúl Montes-de-OcaDOI: 10.14736/kyb-2022-2-0180

Abstract:

The paper concerns Markov decision processes (MDPs) with both the state and the decision spaces being finite and with the total reward as the objective function. For such a kind of MDPs, the authors assume that the reward function is of a fuzzy type. Specifically, this fuzzy reward function is of a suitable trapezoidal shape which is a function of a standard non-fuzzy reward. The fuzzy control problem consists of determining a control policy that maximizes the fuzzy expected total reward, where the maximization is made with respect to the partial order on the $\alpha$-cuts of fuzzy numbers. The optimal policy and the optimal value function for the fuzzy optimal control problem are characterized by means of the dynamic programming equation of the standard optimal control problem and, as main conclusions, it is obtained that the optimal policy of the standard problem and the fuzzy one coincide and the fuzzy optimal value function is of a convenient trapezoidal form. As illustrations, fuzzy extensions of an optimal stopping problem and of a red-black gambling model are presented.

Keywords:

Markov decision process, total reward, fuzzy reward, trapezoidal fuzzy number, optimal stopping problem, gambling model

Classification:

90C40, 93C40

References:

  1. S. Abbasbandy and T. Hajjari: A new approach for ranking of trapezoidal fuzzy numbers. Comput. Math. Appl. 57 (2009), 413-419.   DOI:10.1016/j.camwa.2008.10.090
  2. A. I. Ban: Triangular and parametric approximations of fuzzy numbers inadvertences and corrections. Fuzzy Sets and Systems 160 (2009), 3048-3058.   DOI:10.1016/j.fss.2009.04.003
  3. R. G. Bartle: The Elements of Integration. Wiley, New York 1995.   CrossRef
  4. R. E. Bellman and L. A. Zadeh: Decision-making in a fuzzy enviroment. Management Sci. 17 (1970), 141-164.   DOI:10.1287/mnsc.17.4.B141
  5. R. Cavazos-Cadena and R. Montes-de-Oca: Existence of optimal stationary policies in finite dynamic programs with nonnegative rewards. Probab. Engrg. Inform. Sci. 15 (2001), 557-564.   DOI:10.1017/s0269964801154082
  6. S. H. Chen: Operations of fuzzy numbers with step form membership function using function principle. Information Sci. 108 (1998), 149-155.   DOI:10.1016/S0020-0255(97)10070-6
  7. P. Diamond and P. Kloeden: Metric Spaces of Fuzzy Sets: Theory and Applications. World Scientific, Singapore 1994.   CrossRef
  8. D. Driankov, H. Hellendoorn and M. Reinfrank: An Introduction to Fuzzy Control. Springer Science and Business Media, New York 2013.   CrossRef
  9. R. Efendi, N. Arbaiy and M. M. Deris: A new procedure in stock market forecasting based on fuzzy random auto-regression time series model. Information Sci. 441 (2018), 113-132.   DOI:10.1016/j.ins.2018.02.016
  10. M. Fakoor, A. Kosari and M. Jafarzadeh: Humanoid robot path planning with fuzzy Markov decision processes. J. Appl. Res. Tech. 14 (2016), 300-310.   DOI:10.1016/j.jart.2016.06.006
  11. N. Furukawa: Parametric orders on fuzzy numbers and their roles in fuzzy optimization problems. Optimization 40 (1997), 171-192.   DOI:10.1080/02331939708844307
  12. M. Kurano, M. Yasuda, J. Nakagami and Y. Yoshida: Markov decision processes with fuzzy rewards. In: Proc. Int. Conf. on Nonlinear Analysis, Hirosaki 2002, pp. 221-232.   CrossRef
  13. M. López-Díaz and D. A. Ralescu: Tools for fuzzy random variables: embeddings and measurabilities. Comput. Statist. Data Anal. 51 (2006), 109-114.   DOI:10.1016/j.csda.2006.04.017
  14. W. Pedrycz: Why triangular membership functions?. Fuzzy Sets and Systems 64 (1994), 21-30.   DOI:10.1016/0165-0114(94)90003-5
  15. M. L. Puri and D. A. Ralescu: Fuzzy random variable. J. Math. Anal. Appl. 114 (1986), 402-422.   DOI:10.1016/0022-247x(86)90093-4
  16. M. L. Puterman: Markov Decision Processes: Discrete Stochastic Dynamic. First edition. Wiley-Interscience, California 2005.   CrossRef
  17. S. Rezvani and M. Molani: Representation of trapezoidal fuzzy numbers with shape function. Ann. Fuzzy Math. Inform. 8 (2014), 89-112.   CrossRef
  18. S. Ross: Dynamic programming and gambling models. Adv. Appl. Probab. 6 (1974), 593-606.   DOI:10.1017/S0001867800040027
  19. S. Ross: Introduction to Stochastic Dynamic Programming. Academic Press, New York 1983.   CrossRef
  20. A. Semmouri, M. Jourhmane and Z. Belhallaj: Discounted Markov decision processes with fuzzy costs. Ann. Oper. Res. 295 (2020), 769-786.   DOI:10.1007/s10479-020-03783-6
  21. A. Syropoulos and T. Grammenos: A Modern Introduction to Fuzzy Mathematics. Wiley, New Jersey 2020.   CrossRef
  22. L. Zadeh: Fuzzy sets. Inform. Control 8 (1965), 338-353.   DOI:10.1016/S0019-9958(65)90241-X
  23. W. Zeng and H. Li: Weighted triangular approximation of fuzzy numbers. Int. J. Approx. Reason. 46 (2007), 137-150.   DOI:10.1017/S0012217300001591