Pointeurs

Vidéo du tutoriel de Richard Sutton, https://www.microsoft.com/en-us/research/video/tutorial-introduction-to-reinforcement-learning-with-function-approximation/
some videos of the Boston Dynamics group

Cours (M. Sebag)

24/11 RL_2016_Cours1.pdf
9/12 RL_2016_Cours2.pdf
18/1 RL_2016_Cours3.pdf
23/1, 26/1 revised slides: RL_2016_Cours4.pdf
23/1 Cours Mehdi Khamassi Mehdi_Khamassi_2017_RL.pdf
1/2 RL_2016_Cours5.pdf

Exposés 1/2

Deep Reinforcement Learning for Simulated Autonomous Vehicle Control
- Karim Kouki, Alwine Lambert, Guillaume Lorre
Mastering the game of Go with Deep Neural Networks & Tree Search
- Aris Tritas, Divya Gover, Ahmed Mazari, Hafed Rhouma

TD1 : Markov decision processes [ Exercices here ]

Skeleton code here

Mail : diviyan (at) lri (dot) fr
Here is a few notes, in order to make things clearer about definitions : ComplementsTD1.pdf
The solutions are here :
Note : Some codes might fail because of an error in the string formatting in mdp_grid.valuesString(), remove the calls to this function.

TD2 : Monte Carlo Methods [ Exercices here ]

Solutions :
1 : State-values agent
2 : State-action values agent
3 : Epsilon-greedy policy ?
4 : TD-learning Sarsa

For TD learning : FromMC_to_TD.pdf
For next time : Experiment agents & implement evolutive Epsilon & maybe do stochasticity (4.) . As an exercise, you can also implement a state/action agent with trace (cf. 1. where it is implemented for state values).

TD3 : Function approximation [ Exercices here ]

Part 1 :

Pybrain library documentation : http://pybrain.org/docs/
+ http://simontechblog.blogspot.fr/2010/08/pybrain-reinforcement-learning-tutorial_15.html
use this version : pybrain.zip

Install : https://github.com/pybrain/pybrain/wiki/installation

Part 2 : FA.zip

Correction : Q-Learning with Value Function Approximation Solution.ipynb

Projets

(Un rapport de 2 pages est demandé, ainsi que le code. Le pompage de programmes existants aura une très mauvaise note.)

Projects are done with a maximum number of 3 students in each group except for the two last subjects (Halite & Alesia) in which the limit is 4.
The projects are due to the 24th February, 23:59 GMT+1.

Each group must produce :

A brief report of ~2 pages (max 3 pages without references), TeX and .pdf files, including a description of the approach, results and comparison with other algorithms/state of the art (when possible), using the ICML 2017 format (whose deadline is also on the 24th, fun fact). People not able to write TeX can produce a .doc(x) document, with its .pdf.( Description | ICML2017 TeX package )
The code of your implemented approach. This code should work "out of the box", add a notice/readme giving the list of required packages/libraries, special notes if needed. Producing a code taken from the internet, with no or minimal modifications could lead to unwanted consequences.

You can discuss about your project's problems/ideas, and ask for more information at : diviyan (at) lri (dot) fr

The subjects are the following (increasing difficulty):

Mountain car problem (compare two approaches)
1. Belkham Fella, Medjkoune Nawel et Sorostinean Mihaela
2. Mohamed Abdelkhalek
Inverted pendulum (compare two representations of the problem)
The acrobot
1. Jonathan Crouzet
Octopus
1. Laurent Cetinsoy et Clément Thierry
Td-gammon
1. Xiaoxiao CHEN/Yuxiang WANG/Honglin LI/Dong FEI
2. Ahmed MAZARI & Divya GROVER
3. Aris TRITAS & Hafed RHOUMA
4. Gabriel Quéré, Florence Carton, Alvaro Correia
5. FATHALLAH Mohamed Ali, Amal TARGHI, Katia SANA
bicycle: equilibrium + advancing
1. Abdelhadi Temmar, Stephen Batifol, Nicolas Bougie
Anti-Imitation Policy learning: reproduire une expérience de mainDIVA.pdf
halite.io
1. Force Fidele KIEN, XIyu ZHANG, Yaohui WANG, Herilalaina RAKOTOARISON.
2. Guillaume Lorre, Gabriel Bellard, Karim Kouki et Lambert Alwine
Jeu d'Alesia (voir Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games, ICML 15) Alesia_game.zip

ID	Nom	Commentaire	Envoyé	Taille	Téléchargements
1692	RL_2016_Cours5.pdf	Cours 1/2	sebag mer. 01 de Feb, 2017 02h16	1.63 Mo	989
1691	Q-Learning with Value Function Approximation Solution.ipynb		Diviyan lun. 30 de Jan, 2017 14h59	178.33 Kb	639
1690	FA.zip	TP3.5	Diviyan lun. 30 de Jan, 2017 11h04	115.24 Kb	698
1687	pybrain.zip		Diviyan dim. 29 de Jan, 2017 18h51	2.77 Mo	581
1686	Mehdi_Khamassi_2017_RL.pdf	Cours Mehdi Khamassi	sebag ven. 27 de Jan, 2017 13h42	16.78 Mo	662
1684	Alesia_game.zip		Diviyan jeu. 26 de Jan, 2017 03h05	3.10 Kb	581
1683	FromMC_to_TD.pdf		Diviyan jeu. 26 de Jan, 2017 01h52	758.56 Kb	867
1682	RL_2016_Cours4.pdf	Cours 4, revised	sebag jeu. 26 de Jan, 2017 00h27	1.91 Mo	1313
1676	RL_2016_Cours3.pdf	Cours 18/1/17	sebag mar. 17 de Jan, 2017 20h28	899.85 Kb	768
1674	mainDIVA.pdf	AiPOL.pdf	sebag ven. 13 de Jan, 2017 18h11	1.12 Mo	567
1666	RL_2016_Cours2.pdf		sebag ven. 09 de Dec, 2016 00h55	818.58 Kb	722
1657	ComplementsTD1.pdf		Diviyan ven. 25 de Nov, 2016 14h51	106.28 Kb	770
1653	RL_2016_Cours1.pdf	Cours RL 1	sebag jeu. 24 de Nov, 2016 13h19	1.41 Mo	1466

2016 Module Reinforcement Learning, Michele Sebag, Diviyan Kalainathan

Pointeurs

Cours (M. Sebag)

TD1 : Markov decision processes [ Exercices here ]

TD2 : Monte Carlo Methods [ Exercices here ]

TD3 : Function approximation [ Exercices here ]

Projets

Fichiers joints

actions

2016 Module Reinforcement Learning, Michele Sebag, Diviyan Kalainathan

Pointeurs

Cours (M. Sebag)

TD1 : Markov decision processes [ Exercices here ]

TD2 : Monte Carlo Methods [ Exercices here ]

TD3 : Function approximation [ Exercices here ]

Projets

Fichiers joints