that maximizes a given score function which has many local optima. While algorithms like simulated annealing keep track of the current estimate of , the CE method keeps track of a distribution over parameterized by a vector , . In practice, we initialize so that the distribution has a high variance, and the algorithm decreases this variance. Usually, our distribution is something like a Gaussian with diagonal covariance.
The algorithm requires an initial vector of parameters describing the initial estimate of the distribution of . Usually, we will set this distribution to be centered at our initial best guess of and to have high variance. Here is pseudocode for iteration of the optimization:
1. Generate sample parameters:
2. Score the samples:
3. Choose the with the $M$ highest scores, call these
.
4. Choose the maximum likelihood parameter for generating this set
5. Store in the score for the worst elite sample,
6. If is not better than , or the set has low variance, we have converged.
Here is a Matlab function that does this: cross_entropy_method.m.
Suppose that we were not setting at each iteration to a different value, but instead were defining elite samples as those above a fixed . Then this algorithm can be viewed as finding the parameters that minimize the KL divergence between the distribution
and . The above algorithm can be seen as finding the sequence of both and that converges to the optimal value.
No comments:
Post a Comment