Bread and Butter for a Smile: Cross Entropy Method

Thursday, May 1, 2008

Cross Entropy Method

I'm always forgetting the name of this algorithm, and it suddenly came to me this morning, so I thought I'd write it down here so that I can remember it. Ooh, it's on Wikipedia. There's a more complete tutorial here.
that maximizes a given score function $g(\cdot)$ which has many local optima. While algorithms like simulated annealing keep track of the current estimate of $x$ , the CE method keeps track of a distribution over $x$ parameterized by a vector $v$ , $f(x; v)$ . In practice, we initialize $v$ so that the distribution has a high variance, and the algorithm decreases this variance. Usually, our distribution $f(x;v)$ is something like a Gaussian with diagonal covariance.

The algorithm requires an initial vector of parameters $v$ describing the initial estimate of the distribution of $x$ . Usually, we will set this distribution to be centered at our initial best guess of $x$ and to have high variance. Here is pseudocode for iteration $h$ of the optimization:

1. Generate $N$ sample parameters:
$x^n \sim f(x;v^{h-1}), \quad n = 1,...,N$
2. Score the samples:
$s^n \leftarrow g(x^n), \quad n = 1,...,N$
3. Choose the $x^n$ with the $M$ highest scores, call these
$\{x^{i_1},...x^{i_M}\}$ .
4. Choose the maximum likelihood parameter $v$ for generating this set
$v^h \leftarrow \text{arg} \max_v \prod_{j=1}^M f(x^{i_j};v)$
5. Store in $\gamma^h$ the score for the worst elite sample,
$\gamma^h \leftarrow s^{i_M}$
6. If $\gamma^h$ is not better than $\gamma^{h-d+1}$ , or the set $\{\gamma^{h-d+1},...,\gamma^h\}$ has low variance, we have converged.

Here is a Matlab function that does this: cross_entropy_method.m.

Suppose that we were not setting $\gamma^h$ at each iteration to a different value, but instead were defining elite samples as those above a fixed $\gamma$ . Then this algorithm can be viewed as finding the parameters $v$ that minimize the KL divergence between the distribution
$I_{g(x) \geq \gamma}(x) \propto \left\{ \begin{array}{ll} 1 & g(x) \geq \gamma\\ 0 & \text{otherwise} \end{array} \right.$
and $f(x;v)$ . The above algorithm can be seen as finding the sequence of both $v$ and $\gamma$ that converges to the optimal value.

Bread and Butter for a Smile

Thursday, May 1, 2008

Cross Entropy Method

No comments:

Blog Archive

About Me

Random Panel 2

Stuff I like today

Stuff I don't like

Stuff I red

Stuff I reeeed

My imeem page

Single words that pop culture has forced me to say strangely

Worthless!