Shannon entropy is based on the average probability that a given string of bits will occur in a particular type of digital file. In a general-purpose communications system, that’s the right type of entropy to use, because the characteristics of the data traffic will quickly converge to the statistical averages. Although Shannon’s seminal 1948 paper dealt with cryptography, it was primarily concerned with communication, and it used the same measure of entropy in both discussions.
But in cryptography, the real concern isn’t with the average case but with the worst case. A codebreaker needs only one reliable correlation between the encrypted and unencrypted versions of a file in order to begin to deduce further correlations. In the years since Shannon’s paper, information theorists have developed other notions of entropy, some of which give greater weight to improbable outcomes. Those, it turns out, offer a more accurate picture of the problem of codebreaking.
When Médard, Duffy and their students used these alternate measures of entropy, they found that slight deviations from perfect uniformity in source files, which seemed trivial in the light of Shannon entropy, suddenly loomed much larger. The upshot is that a computer turned loose to simply guess correlations between the encrypted and unencrypted versions of a file would make headway much faster than previously expected.