June, Tuesday 27th

14:30 (room 2014, building 660) (see location ):

Réda Alami et Raphaël Féraud (Orange Labs)

Title: Memory Bandits : A bayesian Approach for the Switching Bandit Problem


The Thompson Sampling algorithm exhibits excellent results in practice, and it has been shown to be asymptotically optimal.
The extension of Thompson Sampling algorithm to the Switching Multi-Armed Bandit Problem, proposed in 2013 by Mellor and Shapiro is a Thompson Sampling equiped with a Bayesian online change point detector.
We propose an extension of this approach based on Bayesian aggregation framework and a first sketch of the pseudo-regret analysis. It seems that the pseudo-regret of our proposal is not far from the one of a Thompson Sampling which already knows the change points.
Experiments provide some evidences that in practice, the proposed algorithm compares favorably with the previous version of Thompson Sampling for the Switching Multi-Armed Bandit Problem, while it outperforms clearly other algorithms of the state-of-the-art.

Contact: guillaume.charpiat at inria.fr

Contributors to this page: guillaume .
Page last modified on Monday 26 of June, 2017 19:05:07 CEST by guillaume.