Information Theory: Common Misconceptions

7/1/05

By Scott Uminsky

The field of “Information Theory” was originated or some say rediscovered by Claude Shannon in 1948 and was quickly adopted by the scientific community. Since then, the terms “Information Theory“, “Shannon Information” and “Shannon Entropy” have become synonymous. Many books and papers have been published since then most notably those by Leon Brillouin. This article will compare the statements made by Leon Brillouin with those made by some contemporary authors who continue to promulgate misconceptions about this subject. An attempt to clarify these misconceptions will be provided.

Leon Brillouin and Information Theory
Leon Brillouin was one of the foremost experts in Information Theory next to Claude Shannon and has been called one of the giants of 20^th century physics. From the 2^nd edition of the book “Science and Information Theory” Leon Brillouin writes the following:
"When letters are used freely (equal a priori probabilities), the information is 4.76 bits per letter. If we impose constraints corresponding to Table 1.1 and take into account the a priori probabilities of the different letters, the information per letter drops to 4.03 bits. Additional constraints will reduce it still further." [1] This quote illustrates two basic scenarios in Information Theory. The first scenario is one in which all of the letters have the same amount of “information” whether or not they’re random or they happen to spell out a meaningful word or sentence. The second scenario is one in which “additional constraints” reduce the amount of “information”. In this second case, as the amount of “information” (in Information Theory) increases, the amount of uncertainty or unpredictability also increases. This uncertainty \ unpredictability is synonymous with increasing disorder. Thus, more “information” equates to more unpredictability. Note also that when calculations are made in Information Theory, the result is given in ‘bits of uncertainty’ or in of ‘bits of entropy’. “We completely ignore the human value of the information. A selection of 100 letters is given a certain information value, and we do not investigate whether it makes sense in English, and, if so, whether the meaning of the sentence is of any practical importance. According to our definition, a set of 100 letters selected at random (according to the rules of Table 1.1), a sentence of 100 letters from a newspaper, a piece of Shakespeare or a theorem of Einstein are given exactly the same informational value.” [2]

“Information is an absolute quantity which has the same numerical value for any observer. The human value on the other hand would necessarily be a relative quantity, and would have different values for different observers…..” [3]

“Whether this information is valuable or worthless does not concern us. The idea of “value” refers to the possible use by a living observer. This is beyond the reach of our theory…” [4] The first quote expands upon the first scenario described above in which a sequence of letters whether random or meaningful have the same amount of “information”. Also here and in the following quotes, Brillouin says that the “information” in Information Theory is an absolute quantity and that the human value of the “information” i.e. information in the day to day use of the word, is a relative quantity. For example, a sentence in Russian usually means something to a Russian and nothing to someone from England. This is why Information Theory does not consider semantic content at all.

Summary
1) There are two basic scenarios in Information Theory.
a) The first scenario is one in which all of the letters (or words) have the same amount of “information” whether or not they’re random or they happen to spell out a meaningful word or sentence. In this case for example, a sequence of 100 random letters have the same amount of “information” as a sequence of 100 letters in a meaningful sentence.
b) The second scenario is one in which “additional constraints” reduce the amount of “information”. That is, the more we constrain a system, the fewer bits of Shannon information can be calculated. The less we constrain a system, the more bits of Shannon information can be calculated. In this case, a random arrangement of letters would have much more Shannon information than a highly constrained arrangement of letters would. Thus, letters in a meaningful sentence have much less Shannon information than a random arrangement of letters.
Note that under either of these scenarios Information Theory does NOT measure complexity.

2) The word “information” in “Information Theory” does NOT mean the same thing as the ordinary word “information”. The ordinary day to day word “information” refers to the generation and transmission of meaningful content. Shannon Information does NOT measure meaningful content and only provides a way to calculate numbers of bits of ‘uncertainty’ or ‘entropy’. That is, in Information Theory, “information” equals “uncertainty” which also equals “entropy”. Therefore, the idea of “information” in Information Theory is completely different from the ordinary day to day word “information”.

Misconceptions Promulgated
Other authors unfortunately, continue to promulgate misinformation or confusion regarding this subject. For example, in the following quote the author’s representation of Information Theory is completely backwards:
“Thus, to constrain a set of possible material states is to generate information in Shannon's sense.”

“The more improbable (or complex) the arrangement, the more Shannon information, or information-carrying capacity, a string or system possesses.” Note again that the more we constrain a system, the fewer bits of Shannon information are generated. Moreover, in Information Theory, the more improbable the arrangement, the more uncertainty or if you will, disorder the system possesses, NOT the more complexity it possesses as this author says.

This next author also makes the same mistake:
“Information theory identifies the amount of information associated with, or generated by, the occurrence of an event (or the realization of a state of affairs) with the reduction in uncertainty.” In fairness, if this author is saying that there is a reduction in uncertainty as fewer bits of Shannon Information are calculated, then he’s correct. However, Shannon Information is normally described as an increase of uncertainty as more bits are calculated. As a result when the amount of “information” in Information Theory increases, the uncertainty or unpredictability also increases.

The next quote comes from a chapter on Information Theory in which the author without any notice, starts talking about meaningful information.
"To learn something, to acquire information, is to rule out possibilities. To understand the information conveyed in a communication is to know what possibilities would be excluded by its truth." This can cause a great deal of confusion if it is not made very clear that “information” in Information Theory is completely different from this.

In the next quote this same author (not counting the way it’s written) continues to confuse the reader a number of ways.
“The more possibilities get ruled out and, correspondingly, the more improbable the possibility that actually obtains, the greater the information generated. To rule out no possibilities is to assert a tautology and provide no information. ‘It’s raining it’s not raining’ is true but totally uninformative.” In this case, the meaning of the word “information” in Information Theory and the ordinary day to day meaningful use of the word “information” are obscured. The first sentence talks about “improbability” and “information” which points to Information Theory especially since this is part of a chapter on Information Theory. The second sentence goes on to discuss meaningful information i.e. whether or not a statement is “informative”. In any case, this kind of obfuscation in all likelihood would be very confusing or misleading to the reader.

In the next quote the author explicitly confuses Shannon Information with meaningful information.
“Shannon called his measure not only the entropy but also the "uncertainty". I prefer this term because it does not have physical units associated with it. If you correlate information with uncertainty, then you get into deep trouble. Yes, you will get into trouble if you fail to distinguish between meaningful information and Shannon uncertainty. Note again that as the amount of “information” (in Information Theory) increases, the uncertainty or unpredictability also increases and that this uncertainty \ unpredictability is synonymous with disorder.

Suppose that:
information ~ uncertainty
but since they have almost identical formulae:
uncertainty ~ physical entropy so
information ~ physical entropy
BUT as a system gets more random, its entropy goes up: randomness ~ physical entropy so
information ~ physical randomness
How could that be? Information is the very opposite of randomness!”

Once again we have this problem of equating ordinary meaningful “information” with the “information” in Information Theory. It is true that meaningful information is the opposite of randomness but the “information” in Information Theory is NOT the opposite of randomness. Again, from Brillouin: “Whether this information is valuable or worthless does not concern us. The idea of “value” refers to the possible use by a living observer. This is beyond the reach of our theory…” [4]

“When letters are used freely (equal a priori probabilities), the information is 4.76 bits per letter. If we impose constraints corresponding to Table 1.1 and take into account the a priori probabilities of the different letters, the information per letter drops to 4.03 bits. Additional constraints will reduce it still further.” [5]
Thus, the more unconstrained \ random the system is, the more Information Theory “information” is calculated. Again this has nothing to do with meaningful “information”.

On another page this same author says: “Likewise, Shannon defined information in a precise technical sense. Beware of writers who slip from the technical definition into the popular one.” Here we have the unfortunate situation in which the author contradicts himself with a correct statement.

Conclusion
Hopefully this helps to clarify the issue and will provide a valuable resource for those interested in reconsidering their material or those interested in learning more about this subject.

S.U. 7/1/05

Bibliography
[1] Science and Information Theory by Leon Brillouin, 2nd edition, page 9
[2] Ibid page 9.
[3] Science and Information Theory by Leon Brillouin, 2nd edition, page 10.
[4] Ibid page 10
[5] Ibid page 9.