February 24th14:30 , R2014 Digiteo Shannon (660) (see location):
Madalina Drugan (Vrije Universiteit Brussel, Belgium)
Title : Multi-objective multi-armed bandits
Multi-objective multi-armed bandits (MOMAB) paradigm extends the
multi-armed bandits (MAB) to reward vectors instead. MOMAB differs from
standard MAB in important ways since several arms are optimal according
to their reward tuples. Techniques from multi-objective optimisation are
used to create MOMAB algorithms with efficient exploration/exploitation
trade-off for complex and large multi-objective stochastic environments.
Theoretical analysis is an important aspect of MAB that is a simplified
theoretical framework of reinforcement learning with a single state. We
give an overview of the MOMAB algorithms, their analysis and the
corresponding experimental methodology.
Contact: cyril.furtlehner at inria.fr