Paper Sharing Three - Automated Curriculum Learning by Rewarding Temporally Rare Events
Name
Automated Curriculum Learning by Rewarding Temporally Rare Events
Attachment
- Original: arxiv
- Code: Github(Will be published shortly)
- Video: 油管
Detail
Problem or Challenge:
Reward shaping are difficult to design manually, especially for complex RL tasks.
Assumptions or hypotheses:
- Domain knowledge: the specification of a set of positive pre-defined events.
- Sparse and/or delayed rewards.
Methods or Solutions:
- Rarity of Events(ROE): Rewarding a reinforcement learning agent by the rarity of experienced events such that rare events have a higher value than frequent events.
- The goal of this approach is to learn through a process of curiosity rather than optimizing towards a difficult pre-defined goal.
- Reward Function:
Experiment or Result:
- A2C VS A2C+RoE, These results show that the selected scenarios are indeed difficult using only extrinsic rewards as A2C learned a weak policy in three of the five scenarios.
2. The episodic mean occurrence of events:
3. Evaluation results averaged over 100 runs show that the A2C+RoE learned a policy that is more versatile, capable of using all the weapons in the map, which is why it can easily adapt.
4. Heat maps show that the A2C+RoE-policy has a more balanced distribution of locations on the map in the Deathmatch scenarios.
Limitation or Weakness:
For domains in which reward shaping is not necessary, i.e. the extrinsic reward smoothly leads to an optimal behavior, our approach is less well suited.
Summary
- Rarity of Events(RoE) determine reward based on the temporal rarity of pre-defined events.
- The approach is designed to work well in challenging environments that have a plethora of known events and sparse and/or delayed rewards.
- Novelty search and Rarity of Events have the ability to learn interesting behaviors without the need for a goal.
- Our approach is designed to learn a policy that can balance their occurrences which results in a more versatile behavior.
Reference
[1] N. Justesen, and S. Risi, 「Automated Curriculum Learning by Rewarding Temporally Rare Events,」 arXiv preprint arXiv:1803.07131, 2018.
[3] 標題圖片來源: Visual Doom AI Competition 2017
推薦閱讀:
TAG:人工智慧演算法 | 強化學習ReinforcementLearning | 深度學習DeepLearning |