Chargement...
 

Historique: Seminar27062017

Aperçu de cette version: 2 (courant)

June, Tuesday 27th

14:30 (room 2014, building 660) (see location):

Réda Alami et Raphaël Féraud (Orange Labs)


Title: Memory Bandits : A bayesian Approach for the Switching Bandit Problem


Abstract:

The Thompson Sampling algorithm exhibits excellent results in practice, and it has been shown to be asymptotically optimal.
The extension of Thompson Sampling algorithm to the Switching Multi-Armed Bandit Problem, proposed in 2013 by Mellor and Shapiro is a Thompson Sampling equiped with a Bayesian online change point detector.
We propose an extension of this approach based on Bayesian aggregation framework and a first sketch of the pseudo-regret analysis. It seems that the pseudo-regret of our proposal is not far from the one of a Thompson Sampling which already knows the change points.
Experiments provide some evidences that in practice, the proposed algorithm compares favorably with the previous version of Thompson Sampling for the Switching Multi-Armed Bandit Problem, while it outperforms clearly other algorithms of the state-of-the-art.




Contact: guillaume.charpiat at inria.fr

Historique

Avancé
Information Version
lun. 26 de Jun, 2017 19h05 guillaume from 129.175.15.11 2
Afficher
lun. 26 de Jun, 2017 19h02 guillaume from 129.175.15.11 1
Afficher