**Seminar31032015**

# March 31th

14:30 , R2014 Digiteo Shannon (660) (see location):## Bruno Scherrer (Inria Nancy)

## Title : Non-Stationary Modified Policy Iteration

(joint work with Boris Lesner)## Abstract :

I will consider the infinite-horizon $\gamma$-discounted optimal controlproblem formalized by Markov Decision Processes. Running any instance

of Modified Policy Iteration---a family of algorithms that can

interpolate between Value and Policy Iteration---with an error

$\epsilon$ at each iteration is known to lead to stationary policies

that are at least $\frac{2\gamma\epsilon}{(1-\gamma)

2}$-optimal.

Variations of Value and Policy Iteration, that build $\ell$-periodic

non-stationary policies, have recently been shown to display a better

$\frac{2\gamma\epsilon}{(1-\gamma)(1-\gamma

\ell)}$-optimalityVariations of Value and Policy Iteration, that build $\ell$-periodic

non-stationary policies, have recently been shown to display a better

$\frac{2\gamma\epsilon}{(1-\gamma)(1-\gamma

guarantee. I will describe a new algorithmic

scheme, Non-Stationary Modified Policy Iteration, a family of

algorithms parameterized by two integers $m \ge 0$ and $\ell \ge 1$

that generalizes all the above mentionned algorithms. While $m$ allows

to interpolate between Value-Iteration-style and Policy-Iteration-style updates,

$\ell$ specifies the period of the non-stationary policy that is

output.

I will present results that show that this new family of algorithms also enjoys the

improved $\frac{2\gamma\epsilon}{(1-\gamma)(1-\gamma^\ell)}$-optimality

guarantee. Perhaps more importantly, I will show, by exhibiting an

original problem instance, that this guarantee is tight for all $m$

and $\ell$; this tightness was to our knowledge only known in two

specific cases, Value Iteration $(m=0,\ell=1)$ and Policy Iteration

$(m=\infty,\ell=1)$. This constitutes a unification of

many dynamic programming algorithms and their analysis

of sensitivity to noise.

Contact: cyril.furtlehner at inria.fr