I'm always forgetting the name of this algorithm, and it suddenly came to me this morning, so I thought I'd write it down here so that I can remember it. Ooh, it's on
Wikipedia. There's a more complete tutorial
here.
that maximizes a given score function

which has many local optima. While algorithms like simulated annealing keep track of the current estimate of

, the CE method keeps track of a distribution over

parameterized by a vector

,

. In practice, we initialize

so that the distribution has a high variance, and the algorithm decreases this variance. Usually, our distribution

is something like a Gaussian with diagonal covariance.
The algorithm requires an initial vector of parameters

describing the initial estimate of the distribution of

. Usually, we will set this distribution to be centered at our initial best guess of

and to have high variance. Here is pseudocode for iteration

of the optimization:
1. Generate
sample parameters:

2. Score the samples:

3. Choose the
with the $M$ highest scores, call these
.
4. Choose the maximum likelihood parameter
for generating this set

5. Store in
the score for the worst elite sample,

6. If
is not better than
, or the set
has low variance, we have converged.
Here is a Matlab function that does this:
cross_entropy_method.m.
Suppose that we were not setting

at each iteration to a different value, but instead were defining elite samples as those above a fixed

. Then this algorithm can be viewed as finding the parameters

that minimize the KL divergence between the distribution

and

. The above algorithm can be seen as finding the sequence of both

and

that converges to the optimal value.
No comments:
Post a Comment