Fullscreen
Loading...
 
Print

Reinforcement Learning, Michele Sebag and Heri Rakotoarison

Books


Videos


Nov. 25th


Dec. 2nd


Dec. 9th


Dec 16th

  • Multi-Armed Bandits, Monte-Carlo Tree Search and applications,
    • slides: see Dec. 2nd.
  • Practical session MCTS on Minesweeper

Jan 6th



Jan 13th

Oral seminar - Batiment 660, Amphi Shannon (Lundi 20 janvier, 14h -17h)


2:00 Dhiaeddoine Youssfi & Wafa Bouzouita: Deep Reinforcement Learning with Double Q-learning

2:20 Nicolas DEVATINE & Alban PETIT: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

2:40 Ziheng LI & Xinneng XU: The Predictron: End-To-End Learning and Planning

3:00 Clément Veyssière & Eric Wang: Recent Advances in Imitation Learning from Observation

3:20 pause

3:40 Geremy Hutin: Skew-Fit: State-Covering Self-SupervisedReinforcement Learning

4:00 Matthieu Charreire, Adrien Lefebvre & Sarah Aamiri (attention, 22 mn d'exposé, vous êtes un trinome): Monte-Carlo Tree Search for Policy Optimization

4:30 Baptiste Merliot & Matthieu Nogatchewsky: Policy Improvement: Between Black-Box Optimization and Episodic Reinforcement Learning

4:50 Simon Monteiro: Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search

5:10 Florian Bertelli & Ramine Hamidi: Neural Program Synthesis By Self-Learning




Deep RL: from Atari to Go and beyond

  1. Human-level control through deep reinforcement learning Corentin Leloup et Maxime CHOR
  2. Deep Reinforcement Learning with Double Q-learning
  3. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Nicolas DEVATINE & Alban PETIT
  4. The Predictron: End-To-End Learning and Planning Ziheng LI & Xinneng XU.

Transfer RL

  1. Off-Policy Actor-Critic with Shared Experience Replay
  2. MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics

Imitation

  1. State Alignment-based Imitation Learning
  2. Recent Advances in Imitation Learning from Observation r Clément Veyssière et Eric Wang
  3. Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

Rewards

  1. Skew-Fit: State-Covering Self-SupervisedReinforcement Learning Geremy Hutin
  2. Learning to solve the credit assignment problem
  3. Ranking Policy Gradient

Optimization for RL

  1. Monte-Carlo Tree Search for Policy Optimization Matthieu Charreire, Adrien Lefebvre et Sarah Aamiri
  2. Policy Improvement: Between Black-Box Optimization and Episodic Reinforcement Learning Baptiste Merliot & Matthieu Nogatchewsky
  3. Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search
  4. Samples Are Useful? Not Always: denoising policy gradient updates using variance explained

Other

  1. Neural Program Synthesis By Self-Learning Florian Bertelli and Ramine Hamidi


Contributors to this page: sebag and Heri .
Page last modified on Monday 20 of January, 2020 15:48:13 CET by sebag.