Loading...
「ツール」は右上に移動しました。
利用したサーバー: wtserver1
6732いいね 232,505 views回再生

An introduction to Policy Gradient methods - Deep Reinforcement Learning

In this episode I introduce Policy Gradient methods for Deep Reinforcement Learning.

After a general overview, I dive into Proximal Policy Optimization: an algorithm designed at OpenAI that tries to find a balance between sample efficiency and code complexity. PPO is the algorithm used to train the OpenAI Five system and is also used in a wide range of other challenges like Atari and robotic control tasks.

If you want to support this channel, here is my patreon link:
  / arxivinsights   --- You are amazing!! ;)

If you have questions you would like to discuss with me personally, you can book a 1-on-1 video call through Pensight: https://pensight.com/x/xander-steenbr...

Links mentioned in the video:
⦁ PPO paper: https://arxiv.org/abs/1707.06347
⦁ TRPO paper: https://arxiv.org/abs/1502.05477
⦁ OpenAI PPO blogpost: https://blog.openai.com/openai-baseli...
⦁ Aurelien Geron: KL divergence and entropy in ML:    • A Short Introduction to Entropy, Cross-Ent...  
⦁ Deep RL Bootcamp - Lecture 5:    • Deep RL Bootcamp  Lecture 5: Natural Polic...  
⦁ RL-adventure PyTorch implementation: https://github.com/higgsfield/RL-Adve...
⦁ OpenAI Baselines TensorFlow implementation: https://github.com/openai/baselines

コメント