Chargement...
 
Tao

Seminar27062017

June, Tuesday 27th

14:30 (room 2014, building 660) (see location):

Réda Alami et Raphaël Féraud (Orange Labs)


Title: Memory Bandits : A bayesian Approach for the Switching Bandit Problem


Abstract:

The Thompson Sampling algorithm exhibits excellent results in practice, and it has been shown to be asymptotically optimal.
The extension of Thompson Sampling algorithm to the Switching Multi-Armed Bandit Problem, proposed in 2013 by Mellor and Shapiro is a Thompson Sampling equiped with a Bayesian online change point detector.
We propose an extension of this approach based on Bayesian aggregation framework and a first sketch of the pseudo-regret analysis. It seems that the pseudo-regret of our proposal is not far from the one of a Thompson Sampling which already knows the change points.
Experiments provide some evidences that in practice, the proposed algorithm compares favorably with the previous version of Thompson Sampling for the Switching Multi-Armed Bandit Problem, while it outperforms clearly other algorithms of the state-of-the-art.




Contact: guillaume.charpiat at inria.fr

Collaborateur(s) de cette page: guillaume .
Page dernièrement modifiée le Lundi 26 juin 2017 19:05:07 CEST par guillaume.