February 24th

14:30 , R2014 Digiteo Shannon (660) (see location):


Madalina Drugan (Vrije Universiteit Brussel, Belgium)




Title : Multi-objective multi-armed bandits


Abstract :


Multi-objective multi-armed bandits (MOMAB) paradigm extends the
multi-armed bandits (MAB) to reward vectors instead. MOMAB differs from
standard MAB in important ways since several arms are optimal according
to their reward tuples. Techniques from multi-objective optimisation are
used to create MOMAB algorithms with efficient exploration/exploitation
trade-off for complex and large multi-objective stochastic environments.
Theoretical analysis is an important aspect of MAB that is a simplified
theoretical framework of reinforcement learning with a single state. We
give an overview of the MOMAB algorithms, their analysis and the
corresponding experimental methodology.


Contact: cyril.furtlehner@inria.fr