Chargement...
 

Historique: Seminar19012016

Aperçu de cette version: 1

January 19th

14:30 , R2014 Digiteo Shannon (660) (see location):


Sébastien Gadat (IMT Toulouse):




Title : Regret bounds for Narendra-Shapiro bandit algorithms.


Abstract :

Narendra-Shapiro (NS) algorithms are bandit-type algorithms introduced in the sixties (with a view to applications in Psychology or learning automata), whose convergence has been intensively studied in the stochastic algorithm literature. In this talk, we study the efficiency of these bandit algorithms from a regret point of view. We show that some competitive bounds can be obtained for such algorithms in a modified penalized version. Up to an over-penalization modification, the pseudo-regret Rn related to the penalized two-armed bandit is uniformly bounded by C sqrt(n) (for a known C). We also generalize existing convergence and rates of convergence results to the multi-armed case of the over-penalized bandit algorithm, including the convergence toward the invariant measure of a Piecewise Deterministic Markov Process (PDMP) after a suitable renormalization. Finally, ergodic properties of this PDMP are given in the multi-armed case.

Contact: cyril.furtlehner à inria.fr


Historique

Avancé
Information Version
mar. 26 de Jan, 2016 14h09 maillard@lri.fr from 129.175.15.11 3
Afficher
mar. 26 de Jan, 2016 14h08 maillard@lri.fr from 129.175.15.11 Added the slides. 2
Afficher
mer. 09 de Dec, 2015 10h52 furtlehn from 129.175.15.11 1
Afficher