Analysis and Experimental Research of Model-Free Reinforcement Learning Method

Authors

  • V. V. Pivoshenko Vinnytsia National Technical University
  • M. S. Kulyk Vinnytsia National Technical University
  • Yu. Yu. Ivanov Vinnytsia National Technical University
  • A. S. Vasiura Vinnytsia National Technical University

DOI:

https://doi.org/10.31649/1997-9266-2019-144-3-40-49

Keywords:

artificial intelligence, machine learning, reinforcement learning, Q-learning, learning strategy, intellectual software agent, bot, optimal parameters, learning curves, experimental researches

Abstract

In this article there has been considered a modern method of machine learning, which is called reinforcement learning. In tasks, that are solved based on interaction, is often impractical to try to get the desired behavior examples of an intellectual software agent, that would be both correct and appropriate for all situations, since the uncertainty conditions exist, arising from incomplete information about an environment and possible actions of other bots or humans. Therefore, the software agent should be trained on the basis of its own experience. An important advantage of the reinforcement learning is the possibility of learning a bot «from scratch» by the balanced combination (search of the compromise) of the «exploration» — «exploitation» modes and learning of the strategies, which allow to sacrifice some scores at this stage for the sake of greater benefit in the future. Researches in the field of the reinforcement learning can be considered as a part of the overall process, that developed over a last few years. It consists of an interaction of an artificial intelligence and other engineering disciplines that is why reinforcement learning develops ideas drawn from the optimal control theory, stochastic optimization and approximation, following common and ambitious goals of the artificial intelligence.

In this work there has been presented the mathematical apparatus of reinforcement learning with the usage of the model-free  Q-learning method, practical aspects of its application have been shown, also an effective strategy for the bot learning in an artificial environment (computer video game) has been developed. The role of the observed object variables is accepted by the information used by the agent, and the hidden variables are long-term estimates of the benefit it gains. Depending on the current status of the environment and bot activities is calculated the benefit function, which is received by the agent at the next time moment. With the usage of the developed software, experimental researches of the considered method have been performed. The optimal setting parameters, curves and time learning of the bot have been obtained. The research results may be useful for computer systems of various functional purposes; they can be used in modeling and design, in automatic control and decision making systems, in robotics, in stock markets, etc.

Author Biographies

V. V. Pivoshenko, Vinnytsia National Technical University

Student of the Department of Computer Systems and Automation

M. S. Kulyk, Vinnytsia National Technical University

Student of the Department of Computer Systems and Automation

Yu. Yu. Ivanov, Vinnytsia National Technical University

Cand. Sc. (Eng.), Senior Lecturer of the Chair of Automatization and Intellectual Information Technologies

A. S. Vasiura, Vinnytsia National Technical University

Cand. Sc. (Eng.), Professor, Professor of the Chair of Automatization and Intellectual Information Technologies

References

O. Hernández-Lerma, J. Hennet, and J. Lasserre, “Average Сost Markov Decision Processes: Optimality conditions,” Journal of Mathematical Analysis and Applications, vol. 158, no. 2, pp. 396-406, 1991.

R. Bellman, “A Markovian Decision Process,” Indiana University Mathematics Journal, vol. 6, no. 4, pp. 679-684, 1957.

L. Busoniu, R. Babuska, B. Schutter, and D. Ernst, “Reinforcement Learning and Dynamic Programming Using Function Approximators,” Automation and Control Engineering, pp. 55-88, 2010.

А. С. Васюра, Т. Б. Мартинюк, та Л. М. Куперштейн, Методи та засоби нейроподібної обробки даних для систем керування. Вінниця, Україна: Універсум-Вінниця, 2008.

C. J. C. H. Watkins, and P. Dayan, Reinforcement Learning, Technical Note, 1992, pp. 55-68.

F. Chollet, Deep learning with Python. Shelter Island. NY: Manning Publications Co., 2018, pp. 27-38.

J. Gläscher, N. Daw, P. Dayan, and J. P. O’doherty, “States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning,” Neuron, vol. 66, no. 4, pp. 585-595, 2010.

R. S. Sutton, and A. G. Barto, Reinforcement learning: an introduction. Cambridge: The MIT Press, 2015, pp. 143-160.

Т. М. Боровська, А. С. Васюра, та В. А. Северілов, Моделювання та оптимізація систем автоматичного управління. Вінниця, Україна: ВНТУ, 2009.

C. Jin, Z. Allen-Zhu, S. Bubeck, and M. Jordan, "Is Q-learning Provably Efficient?", arXiv.org, 2018. [Electronic resource]. Available: https://arxiv.org/pdf/1807.03765.pdf . Accessed: Jul. 10, 2018.

J. Dornheim, N. Link, and P. Gumbsch, “Model-Free Adaptive Optimal Control of Sequential Manufacturing Processes Using Reinforcement Learning,” arXiv.org, 2019. [Electronic resource]. Available: https://arxiv.org/abs/1809.06646v1 . Accessed: Jan. 07. 2019.

W. Haskell, and W. Huang, "Stochastic Approximation for Risk-Aware Markov Decision Processes", Arxiv.org, 2018. [Electronic resource]. Available: https://arxiv.org/pdf/1805.04238.pdf. Accessed: May. 17, 2018.

R. Bellman, “Dynamic programming and stochastic control processes,” Information and Control, vol. 1, no. 3, pp. 228-239, 1958.

C. J. C. H. Watkins, Learning from delayed rewards. University of Cambridge, 1989, pp. 55-68.

L. P. Kaelbling, M. L. Littman, and A. W. Moore, “An Introduction to Reinforcement Learning,” The Biology and Technology of Intelligent Autonomous Agents, 1995, pp. 90–127.

M. Rahman and H. Rashid, “Implementation of Q Learning and Deep Q Network for Controlling a Self-Balancing Robot Model,” ArXiv.org, 2018. [Electronic resource]. Available: https://arxiv.org/pdf/1807.08272.pdf . Accessed: Jul. 22, 2018.

C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3-4, pp. 279-292, 1992.

E. Even-Dar and Y. Mansour, “Learning Rates for Q-Learning,” Lecture Notes in Computer Science Computational Learning Theory, 2001, pp. 589–604.

Downloads

Abstract views: 399

Published

2019-06-26

How to Cite

[1]
V. V. Pivoshenko, M. S. Kulyk, Y. Y. Ivanov, and A. S. Vasiura, “Analysis and Experimental Research of Model-Free Reinforcement Learning Method”, Вісник ВПІ, no. 3, pp. 40–49, Jun. 2019.

Issue

Section

Information technologies and computer sciences

Metrics

Downloads

Download data is not yet available.

Most read articles by the same author(s)