\(\newcommand {\mat}[4] {\left(\begin{smallmatrix}{#1}&{#2}\\{#3}&{#4}\end{smallmatrix}\right)}\) \(\newcommand {\A} {{\mathfrak {A}}}\) \(\newcommand {\C} {{\mathbb C}}\) \(\newcommand {\E} {{\mathbb E}}\) \(\newcommand {\F} {{\mathbb F}}\) \(\newcommand {\Pro} {{\mathbb P}}\) \(\newcommand {\Q} {{\mathbb Q}}\) \(\newcommand {\R} {{\mathbb R}}\) \(\newcommand {\Z} {{\mathbb Z}}\) \(\newcommand {\P}[1] {{\mathbb P}^{#1}}\) \(\newcommand {\sym}[1] {{\operatorname{#1}}}\) \(\newcommand {\SL}[1] {{\sym{SL}(2,#1)}}\) \(\newcommand {\hil}[3] {\left({#2},{#3}\right)_{#1}}\) \(\newcommand {\leg}[2] {\left(\tfrac{#1}{#2}\right)}\)
§1. What is coding theory

In coding theory we meet the following scenario. A source emits information and a receiver tries to log this information. Typically, the information is broken up into atomic parts like letters from an alphabet and information consists of words, i.e. sequences of letters. The problem is that the information might be disturbed by not optimal transport media resulting in incidental changes of letters

Real life examples are the transmission of bits via radio signals for transmitting pictures from deep space to earth, e.g. pictures taken by a Mars robot . Or as a more every day life example the transmission of bits via radio signals for digital TV. The source could also be the sequence of bits engraved in an audio or video disk, and the transmission is now the reading by the laser of the CD reader: little vibrations of the device or scratches on the disk cause errors of transmission.

A source emits \(0\)s and \(1\)s, say, at equal probability. Let \(p\) be the probability that an error occurs, i.e. that a \(0\) or \(1\) arrives as a \(1\) or \(0\) at the receiver. If \(p\) is very small we might decide to accept these errors, and if \(p\) is almost \(1\) we might also decide to not care since we simply interpret \(1\) as \(0\) and vice versa, which reduces again the error probability to a negligible quantity. If the error probability is exactly \(\frac 12\) we cannot do anything but asking the engineers to study the problem of improving the transmission. However, if \(p\) is, say only a bit smaller than \(\frac 12\) and we need a more reliable transmission, coding comes into play.

The natural idea is to fix a natural number \(n\) and if we want to transmit the bit \(b\) we send the sequence \(bb\dots b\) of length \(n\). In other words, we encode \(b\) into a sequence of \(n\)-many \(b\)s. The receiver must, of course, be informed of this convention. He will decode then according to the principle of Maximum Likelihood Decoding. If he receives a sequence \(s\) of length \(n\), he interprets it as a \(0\) if the word \(s\) contains more \(0\)s than \(1\)s and vice versa. In other words, he he interprets \(s\) as a \(0\) if \(s\) resembles more a sequence of \(n\)-many \(0\)s and otherwise as \(1\). Here we assume for simplicity that \(n\) is odd, so that a word of length \(n\) can never contain an equal number of \(0\)s and \(1\)s.

What is now the probability of missing the right message? If we send a sequence of \(n\)-many \(0\)s then receiving instead any word with \(r\ge \frac {n+1}2\) many \(1\)s would result in an error. The probability of receiving a given word of this kind is \(p^r(1-p)^{n-r}\), and there are \(\binom nr\) such words. The error probability is therefore now \[ P_n = \sum_{r = \frac {n+1}2}^n \binom nr p^r(1-p)^{n-r} . \] It is not hard to show (see below) that \(\lim_{n\to\infty} P_n=0\). Therefore, our repetition code can improve a bad transmission to a one as good as we want, provided the transmission error \(p\) for bits is strictly less than \(\frac 12\).

What makes the repetition code so efficient is the fact that its two code words are very different. In fact they differ at all \(n\) places. However, there is a price to pay. Assume that you want to transmit a video of size \(1\) GB through a channel which has an error probability \(p=0.1\) when transmitting bits. This is certainly not acceptable since that would mean that \(10\) percent of the received video consists of flickering garbage. We might like to transmit the video via the repetition code of length \(n\). The first values for the sequence \(P_n\) are

\(P_1 = 1.000000e-01\), \(P_3 = 2.800000e-02\), \(P_5 = 8.560000e-03\), \(P_7 = 2.728000e-03\), \(P_9 = 8.909200e-04\), \(P_{11} = 2.957061e-04\), \(P_{13} = 9.928549e-05\), \(P_{15} = 3.362489e-05\), \(P_{17} = 1.146444e-05\), \(P_{19} = 3.929882e-06\).
For having transmission errors less than \(0.1\) percent we would have to choose \(n=9\), which would mean that we would have to transmit \(9\) GB for a video not bigger than \(1\) GB. In this sense the repetition code seems to us very inefficient. What makes it so inefficient is that there are only two possible informations, i.e. two code words to transmit, but they have length \(n\). In other words there is only one bit of information for every \(n\) transmitted bits.

We would like to insist on our idea but search for better codes. For example, for our case of transmitting a video we might try to find, for some (possibly big) number \(n\), a subset \(C\) of the set \(\{0,1\}^n\) of all sequences of length \(n\) of digits \(0\) or \(1\) which satisfies the following two properties:

  1. Every two distinct sequences in \(C\) should differ in as much as possible places. In other words, the quantity \[ d(C) = \min \{h(v,w): v,w\in C, v\not=w\} \] should be very large, where \(h(v,w)\) denotes the number of places where \(v\) and \(w\) differ.
  2. The quotient \[ R(C) = \frac {\log_2(|C|)}n \] should be large as well.

The number \(\log_2(|C|)\) is the quantity of information (measured in bits) which is contained in every transmission of a sequence in \(C\), i.e. in every transmission of \(n\) bits. The ratio \(R(C)\) has therefore to be interpreted a the ratio of information per bit of transmission. We would then cut our video in sequences of length \(k\), where \(k=\lfloor \log_2(|C|)\rfloor\), and map these pieces via a function (preferably designed by an engineer) to the sequences in \(C\), send the encoded words and decode them at the other end of the line using Maximum Likelihood Decoding. The Maximum Likelihood Decoding will yield good results if \(d\) is very large, i.e. if the code words differ as much as possible. We shall see later (Shannon's Theorem) that there are codes \(C\) which have \(R(C)\) as close as desired to a quantity called channel capacity (which depends on \(p\)), and the probability of a transmission error in a code word as low as desired. Of course, the length \(n\) might be very long, which might cause engineering problems like an increased time needed for encoding or decoding.

We stress an important property of the repetition code which we discussed above. Namely, it can correct \(\frac {n-1}2\) errors. This means the following: if the sent code word and the received one do not differ at more than \(\frac {n-1}2\) places, the Maximum Likelihood Decoding will return the right code word, i.e. it will correct the errrors. In general we shall mostly interested in such error correction codes.

However, in some situations one might be only interested in detecting errors, not necessarily correcting them. Examples for such a code are the International Standard Book Numbers ISBN10 and ISBN13. Here to every published book is associated a unique identifier. In the case of ISBN10 this is a word \(d_1d_2\cdots d_{10}\) of length 10 with letters from the alphabet \(0,1,\dots,9,X\). The procedure of this association is not important to us (but see here for details). What is important for us is that it is guaranteed that the sum \[ N:=d_1+2d_2+3d_3+\cdots+10d_{10} \] is always divisible by \(11\) (where the symbol \(X\) is interpreted as the number \(10\)). By elementary number theory the following happens: if exactly one letter is wrongly transmitted then \(N\) is no longer divisible by \(11\). In other words, we can detect one error. However, there is no means to correct this error (except, that we would be told at which place the error occurs). We shall come back to this later, when we recall some elementary number theory.

A property of the binomial distribution

We prove the statement that the sequence of the \(P_n\) in the above example tend to \(0\). In fact, this can be obtained from Chebyshev's inequality applied to a sequence of random variables \(X_n\), where \(P(X_n=k)=\binom nr p^r(1-p)^{n-r}\), i.e. where \(X_n\) follows the binomial distribution with parameters \(n\) and \(p\).This distribution measures the probability of successes in a sequence of \(n\) independent trials where the probability of success in a single trial is \(p\). However, it is also possible to give a short direct proof avoiding the indicated concepts.


For every \(0\le p \le 1\) and every \(\lambda \gt p\), one has \[ \lim_{n\to\infty} \sum_{r \ge \lambda n} \binom nr p^r(1-p)^{n-r} = 0 \]

For \(p\lt \frac 12\) we can choose \(\lambda = \frac 12\) in the proposition, and we obtain the claimed statement \(P_n\to 0\).

It is clear that \[ \sum_{r \ge \lambda n} \binom nr p^r(1-p)^{n-r} \le \sum_{r=0}^n \binom nr p^r(1-p)^{n-r} \left(\frac {r-np}{(\lambda-p)n}\right)^2 , \] since, for \(r\ge \lambda n\), we have \(1 \le \frac {r-np}{(\lambda-p)n}\). But the right hand side equals \[ \frac 1{(\lambda-p)^2n^2} \big(\frac {d^2}{dt^2}-2np \frac {d}{dt}+n^2p^2\big) (pe^x+1-p)^n\big|_{t=0} = \frac {p(1-p)}{(\lambda-p)^2n} , \] which tends to \(0\).

Find all subsets \(C\) in \(\{0,1\}^5\) up to isomorphism, and compute \(d(C)\) and \(R(C)\) for each . (Two subsets are called isomorphic if one can be obtained from the other by a fixed permutation of the places of the other's sequences.)

Which book possesses the ISBN-10 \("3540641*35"\)? (First of all you have to find the \(8\)th digit.)