求reinforcement learning and Q-learning 歷史?
想了解一下AI,reinforcement learning ,q learning的發展歷史,有沒有什麼推薦的書?或者網站,文章?
《Reinforcement Learning-An Introduction》-----入門經典
《Algorithms for Reinforcement Learning》----各種演算法《Reinforcement Learning State-of-the-Art》---最新進展(flat,hierarchical...)
《Recent Advances in Reinforcement Learning》 RL資源整合:https://github.com/stone8oy/deepRL/tree/resource可以看看這篇,非常詳盡,Gradient Temporal-Difference Learning Algorithms http://webdocs.cs.ualberta.ca/~sutton/papers/maei-thesis-2011.pdf
- ARTIFICIAL INTELLIGENCEFOUNDATIONS OF COMPUTATIONAL AGENTS http://artint.info/html/ArtInt_265.html Q-learning 這是這本書講Q learning 的部分,當然你回到它的主頁從頭看就是一本AI的參考書
- http://mnemstudio.org/path-finding-q-learning-tutorial.htm A painless Q-learning tutorial, 這個簡單的教程能夠讓你很快了解Qlearning 並且這個網站的主題也是AI
- 希望對你有用
1.1.
Reinforcement learning
1.1.1.
Introduction
The main aim of reinforcement learning is learning what to
do; how to map situations into actions; thus maximizing a numerical reward
signal. [6 p3] Some professional book also explain that reinforcement learning
is 「the learning of mapping from situations to actions so as to maximize a
scalar reward of reinforcement signal.」 [5 p362]
The development of Reinforcement learning
At the beginning of reinforcement learning, it had two main
threads:
1.
Minsky started extensive research to explore
trial-and-error learning in 1954. In Minsky』s PH.D thesis, Minsky discussed
computational models of reinforcement learning and describes his construction
of an analog machine. Which he called SNARCs (Stochastic Neural-Analog
Reinforcement Calculators).
2.
On the other side, Farley and Clark described
another neural-network learning machine designed to learn by trial-and-error.
In the 1960s, they called this kind of machine learning 「reinforcement」. After
that, 「reinforcement learning」 being widely used in engineering and computer
science literature for the first time.
[5 p363]
After that, Reinforcement learning was continuously
improved:
·
In 1994 and 1995, Farley and Clark shifted from
reinforcement learning to Supervised Learning, which began as a pattern of
confusion about the relationship between these types of learning.
·
Bellman introduced the optimal control problem
known as Markovian decision processes (MDPs). All of these are essential
elements underlying the theory and algorithms of modern reinforcement learning.
·
In the late 1950s, the term 「optimal control」
came into use to describe the problem of designing a controller to minimize a
measure of a dynamical system"s behavior over time. In the mid-1950s,
approaches to this problem was developed by Richard Bellman and colleagues by
extending a 19th century theory of Hamilton and Jacobi.
·
In 1989, the temporal-difference and optimal
control threads were fully brought together with Chris Watkins"s development of
Q-learning (Watkins et al, 1989). This work extended and integrated prior work
in all three threads of reinforcement learning research.
[5 p364]
reference
5.
Zhongzhi Shi,
Advanced Artificial intelligence, World Scientific, Mar 2011
6.
Andrew G. Barto, Reinforcement Learning: An
Introduction, MIT Press, 1998
最後從幾本書裡面整理的……希望對未來需要了解這部分歷史或者同樣需要寫paper一些啟發
Q-learning演算法本身粗看上去通俗易懂,但是要想弄明白的話 好好看看model-free reinforcement learning
Q-learning 是RL下的一種方式
推薦閱讀:
※為什麼阿西莫夫的機器人三大定律未被寫入現存的智能機器人的程序中?
※人工智慧機器人能夠體現人文作用嗎?
※機器人可以擁有人的情感嗎?
※如何評價《Science》上刊載的最新研究:人工智慧通過圖靈測試?
※學習機器學習有哪些好工具推薦?
TAG:人工智慧 | 機器學習 | 強化學習ReinforcementLearning |