Sunday, March 14, 2010

Heavy-tailed vs long-tailed


Fernanda Viégas and Martin Wattenberg’s “Flickr Flow”

The other day (and I really mean like the other day cuz it was quite some time ago (does the more you emphasize the other day make the other day farther back in time?)) Kyle corrected me on my probability jargon -- I said that ... something ... was a heavy-tailed distribution and he corrected me that it was in fact a long-tailed distribution. Oh, I remember what it was. We watched (most) of Anvil: The Story of Anvil and were discussing how they could make money selling CDs/mp3s on the internet but probably not have big tours, cuzza the heavy/long-tailed distribution of musical tastes. Anyways, yeah, so the "it takes all kinds" idea -- is that heavy-tailed or long-tailed?

According to Wikipedia, that definitive source, a long-tailed distribution is a special case of a heavy-tailed distribution. A heavy-tailed distribution is any distribution whose tail(s) are not exponentially bounded, or
That is, we look really far to the side of a distribution (x ∞), and consider the probability that we will see a value even more extreme than this value for this distribution and for the exponential distribution. The distribution is heavy-tailed if the probability of an extreme event is much larger for this distribution than for the exponential distribution (the ratio is ∞ as x ∞).

For long-tailed distributions, the requirement is more extreme:
for all t > 0. That is, for large x, the if you're going to see something bigger than x, then you're going to see something much bigger than x.

So the commonly discussed "long-tail" distribution is the power law distribution,
with tail distribution
for α > 1.
So does the long-tailed condition hold? Well,
which indeed has a limit of 1 as x → ∞. So, indeed, the power law distribution is long-tailed.

Now, there is the question of whether Kyle was right/more exact than me. If all we're talking about is power-law distributions, then I suppose he is. Let me just emphasize that we are both technically correct, but he is more exact. According to Wikipedia (All hail Wikipedia!) there are distributions which are heavy-tailed and not long-tailed. I suppose the question is whether people in pop science are referring to these, and I suppose the answer is probably no.

“Fury said to
        a mouse, That
          he met in the
            house, Let
              us both go
                to law: I
                  will prose—
                    cute you.—
                  Come I’ll
                take no
              denial: We
            must have
          the trial;
        For really
      this morning
    I’ve
  nothing
  to do.
   Said the
    mouse to
     the cur,
      ’Such a
        trial, dear
          sir. With
            no jury
             or judge,
              would
             be wasting
            our
          breath.’
       ’I’ll be
      judge,
     I’ll be
    jury,’
   said
  cunning
   old
    Fury:
     ’I’ll
        try
          the
            whole
              cause,
                and
               condemn
             you to
          death.’”
Alice in Wonderland, Lewis Carroll

Thursday, March 4, 2010