Thursday, May 1, 2008

Cross Entropy Method

I'm always forgetting the name of this algorithm, and it suddenly came to me this morning, so I thought I'd write it down here so that I can remember it. Ooh, it's on Wikipedia. There's a more complete tutorial here.
that maximizes a given score function which has many local optima. While algorithms like simulated annealing keep track of the current estimate of , the CE method keeps track of a distribution over parameterized by a vector , . In practice, we initialize so that the distribution has a high variance, and the algorithm decreases this variance. Usually, our distribution is something like a Gaussian with diagonal covariance.

The algorithm requires an initial vector of parameters describing the initial estimate of the distribution of . Usually, we will set this distribution to be centered at our initial best guess of and to have high variance. Here is pseudocode for iteration of the optimization:

1. Generate sample parameters:

2. Score the samples:

3. Choose the with the $M$ highest scores, call these
.
4. Choose the maximum likelihood parameter for generating this set

5. Store in the score for the worst elite sample,

6. If is not better than , or the set has low variance, we have converged.

Here is a Matlab function that does this: cross_entropy_method.m.

Suppose that we were not setting at each iteration to a different value, but instead were defining elite samples as those above a fixed . Then this algorithm can be viewed as finding the parameters that minimize the KL divergence between the distribution

and . The above algorithm can be seen as finding the sequence of both and that converges to the optimal value.

No comments: