I'm always forgetting the name of this algorithm, and it suddenly came to me this morning, so I thought I'd write it down here so that I can remember it. Ooh, it's on
Wikipedia. There's a more complete tutorial
here.
that maximizes a given score function
data:image/s3,"s3://crabby-images/584f8/584f8a4aa3564470e29ece569335faeb1804c99b" alt=""
which has many local optima. While algorithms like simulated annealing keep track of the current estimate of
data:image/s3,"s3://crabby-images/17a7f/17a7fcee90cee8a57f0987ca93f5784def2a62fd" alt=""
, the CE method keeps track of a distribution over
data:image/s3,"s3://crabby-images/17a7f/17a7fcee90cee8a57f0987ca93f5784def2a62fd" alt=""
parameterized by a vector
data:image/s3,"s3://crabby-images/43eed/43eed21fd8f2ec1bd93b59c8038d10be02c6cac5" alt=""
,
data:image/s3,"s3://crabby-images/a57bb/a57bb4891b3753eaef74e26f2006b81cdc53741c" alt=""
. In practice, we initialize
data:image/s3,"s3://crabby-images/43eed/43eed21fd8f2ec1bd93b59c8038d10be02c6cac5" alt=""
so that the distribution has a high variance, and the algorithm decreases this variance. Usually, our distribution
data:image/s3,"s3://crabby-images/1f134/1f1345dc3d17667f2d586a788157dac85f89bac3" alt=""
is something like a Gaussian with diagonal covariance.
The algorithm requires an initial vector of parameters
data:image/s3,"s3://crabby-images/43eed/43eed21fd8f2ec1bd93b59c8038d10be02c6cac5" alt=""
describing the initial estimate of the distribution of
data:image/s3,"s3://crabby-images/17a7f/17a7fcee90cee8a57f0987ca93f5784def2a62fd" alt=""
. Usually, we will set this distribution to be centered at our initial best guess of
data:image/s3,"s3://crabby-images/17a7f/17a7fcee90cee8a57f0987ca93f5784def2a62fd" alt=""
and to have high variance. Here is pseudocode for iteration
data:image/s3,"s3://crabby-images/d56f6/d56f611e79ff5b4a42db487f0a7484193001e780" alt=""
of the optimization:
1. Generate
sample parameters:
data:image/s3,"s3://crabby-images/9ea1f/9ea1f7564645552afc95233db3be9e583d32f3bd" alt=""
2. Score the samples:
data:image/s3,"s3://crabby-images/6911b/6911b90dfdb32ff0a4164f7bdc0161cfd31372d9" alt=""
3. Choose the
with the $M$ highest scores, call these
.
4. Choose the maximum likelihood parameter
for generating this set
data:image/s3,"s3://crabby-images/5cd32/5cd3271553356a6b0bb241b7daff555fe10f8dd4" alt=""
5. Store in
the score for the worst elite sample,
data:image/s3,"s3://crabby-images/3bfee/3bfeeaea1a3c53ca41dfdf93b946b26de4620dee" alt=""
6. If
is not better than
, or the set
has low variance, we have converged.
Here is a Matlab function that does this:
cross_entropy_method.m.
Suppose that we were not setting
data:image/s3,"s3://crabby-images/fcfd5/fcfd589471f5f68eaa5507d9a88516c8f7035a0b" alt=""
at each iteration to a different value, but instead were defining elite samples as those above a fixed
data:image/s3,"s3://crabby-images/d0d99/d0d990c26fdad653f4881e43585694d0c8692696" alt=""
. Then this algorithm can be viewed as finding the parameters
data:image/s3,"s3://crabby-images/43eed/43eed21fd8f2ec1bd93b59c8038d10be02c6cac5" alt=""
that minimize the KL divergence between the distribution
data:image/s3,"s3://crabby-images/26ff5/26ff5d23f650b9b7ff1689eda1c9c46368c9880c" alt=""
and
data:image/s3,"s3://crabby-images/1f134/1f1345dc3d17667f2d586a788157dac85f89bac3" alt=""
. The above algorithm can be seen as finding the sequence of both
data:image/s3,"s3://crabby-images/43eed/43eed21fd8f2ec1bd93b59c8038d10be02c6cac5" alt=""
and
data:image/s3,"s3://crabby-images/d0d99/d0d990c26fdad653f4881e43585694d0c8692696" alt=""
that converges to the optimal value.
No comments:
Post a Comment