求reinforcement learning and Q-learning 歷史？

01-06

想了解一下AI，reinforcement learning ，q learning的發展歷史，有沒有什麼推薦的書？或者網站，文章？

《Reinforcement Learning-An Introduction》-----入門經典

《Algorithms for Reinforcement Learning》----各種演算法

《Reinforcement Learning State-of-the-Art》---最新進展（flat，hierarchical...)

《Recent Advances in Reinforcement Learning》

RL資源整合：

https://github.com/stone8oy/deepRL/tree/resource

可以看看這篇，非常詳盡，Gradient Temporal-Difference Learning Algorithms http://webdocs.cs.ualberta.ca/~sutton/papers/maei-thesis-2011.pdf

ARTIFICIAL INTELLIGENCEFOUNDATIONS OF COMPUTATIONAL AGENTS
http://artint.info/html/ArtInt_265.html Q-learning 這是這本書講Q learning 的部分，當然你回到它的主頁從頭看就是一本AI的參考書
http://mnemstudio.org/path-finding-q-learning-tutorial.htm A painless Q-learning tutorial, 這個簡單的教程能夠讓你很快了解Qlearning 並且這個網站的主題也是AI
希望對你有用

1.1.
Reinforcement learning

1.1.1.
Introduction

The main aim of reinforcement learning is learning what to
do; how to map situations into actions; thus maximizing a numerical reward
signal. [6 p3] Some professional book also explain that reinforcement learning
is 「the learning of mapping from situations to actions so as to maximize a
scalar reward of reinforcement signal.」 [5 p362]

1.1.2.
The development of Reinforcement learning

At the beginning of reinforcement learning, it had two main
threads:

1.
Minsky started extensive research to explore
trial-and-error learning in 1954. In Minsky』s PH.D thesis, Minsky discussed
computational models of reinforcement learning and describes his construction
of an analog machine. Which he called SNARCs (Stochastic Neural-Analog
Reinforcement Calculators).

2.
On the other side, Farley and Clark described
another neural-network learning machine designed to learn by trial-and-error.
In the 1960s, they called this kind of machine learning 「reinforcement」. After
that, 「reinforcement learning」 being widely used in engineering and computer
science literature for the first time.

[5 p363]

After that, Reinforcement learning was continuously
improved:

·
In 1994 and 1995, Farley and Clark shifted from
reinforcement learning to Supervised Learning, which began as a pattern of
confusion about the relationship between these types of learning.

·
Bellman introduced the optimal control problem
known as Markovian decision processes (MDPs). All of these are essential
elements underlying the theory and algorithms of modern reinforcement learning.

·
In the late 1950s, the term 「optimal control」
came into use to describe the problem of designing a controller to minimize a
measure of a dynamical system"s behavior over time. In the mid-1950s,
approaches to this problem was developed by Richard Bellman and colleagues by
extending a 19th century theory of Hamilton and Jacobi.

·
In 1989, the temporal-difference and optimal
control threads were fully brought together with Chris Watkins"s development of
Q-learning (Watkins et al, 1989). This work extended and integrated prior work
in all three threads of reinforcement learning research.

[5 p364]

reference

5.
Zhongzhi Shi,
Advanced Artificial intelligence, World Scientific, Mar 2011

6.
Andrew G. Barto, Reinforcement Learning: An
Introduction, MIT Press, 1998

最後從幾本書裡面整理的……希望對未來需要了解這部分歷史或者同樣需要寫paper一些啟發

Q-learning演算法本身粗看上去通俗易懂，但是要想弄明白的話好好看看model-free reinforcement learning

Q-learning 是RL下的一種方式