Linear network coding

Network coding is a field of information theory and coding theory and is a method of attaining maximum information flow in a network.

General principle

A network is a directed graph, where the edges represent pathways for information. Using the max-flow min-cut theorem, one can calculate the maximum amount of information that can be pushed through this network between two nodes. It was shown (2000, Ahlswede et al.) that this max-flow value is achievable, but that routing is not capable of achieving it.

The core notion of network coding is to allow mixing of data at intermediate network nodes. A receiver sees these data packets and deduces from them the messages that were originally intended for that data sink.

In the butterfly example below, it is easy to see that no routing scheme can achieve the maximum flow, but that by using a simple coding scheme, the full flow can be attained.

Linear network coding

Although the max-flow was shown to be achievable, it was shown using a random code. In 2003, it was shown (by Li, Yeung, and Cai) that linear codes will achieve the max-flow in single-source multicast (every destination node receives all the source information) networks, provided the alphabet size is large enough.

In 2004, it was shown (by Dougherty, Freiling, and Zeger) that linear network codes are not sufficient in general. An example in the paper is not solvable using any linear code over any field size or in any dimension; further, it is not asymptotically linear nor is it linearly solvable using convolutional codes or time-sharing methods. They also showed that the insufficiency of linear codes applies to multiple unicast networks.

Theory

In a Linear Network coding problem a group of nodes $P$ are involved in moving the data from $S$ source nodes to $K$ sink nodes. Each nodes generates a new packet, which is a linear combination of the earlier received packets on the link, by coefficients in the finite field.

A message generated so $X_{k}$ is related to the received messages $M_{i}$ by the relation
$X_{k}=\sum _{i=1}^{S}g_{k}^{i}.M_{i}$ . Each node forwards the computed value $X_{k}$ along with the all the coefficients used in the $k^{th}$ level, $g_{k}^{i}$ .

The values $g_{k}^{i}$ are the coefficients from the Galois field $GF(2^{s})$ . Since the operations are computed in the finite field, the result of the operation is also of the same length, as the size of each vector $M$ .

Each node produces a similar output, as computed above. This yields a linear problem of the type $X=GM$ ,
where with the knowledge of the $X,G$ we need to compute $M$ . Each of the receivers in $K$ , try to solve this linear equation, and for which at least $N\geq S$ packets must be received. The received packets are continually used in the Gaussian elimination method to reduce $G$ matrix into the row-echelon form. Finally the resulting values of $X=G_{echelon}M$
are solved to obtain $M$ .

Example

Butterfly Network.

In the butterfly network, there are two sources (at the top of the picture), each having knowledge of some value A and B. There are two destination nodes (at the bottom), which each want to know both A and B. Each edge can carry only a single value (we can think of an edge transmitting a bit in each time slot).

If we only used routing, then the central line would be able to carry A or B, but not both. Suppose we send A through the center; then the left destination would receive A twice and not know B at all. Sending B poses a similar problem for the right destination. We say that routing is insufficient because no routing scheme can transmit both A and B simultaneously to both destinations.

Using a simple code, as shown, we do get both A and B to both destinations simultaneously by sending the sum of the symbols through the center (in other words, we encode A and B using the formula "A+B"). The left destination receives A and A+B, and can find B by subtracting the two values. This is a linear code because the encoding and decoding schemes are linear operations.

History

The fundamental concept of network coding was introduced for satellite communication networks in a paper by R. W. Yeung and Z. Zhang, "Distributed Source Coding for Satellite Communications" (IEEE Transactions on Information Theory, IT-45, pp. 1111-1120. 1999). The concept was fully developed in a subsequent paper by R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung, "Network Information Flow" (IEEE Transactions on Information Theory, IT-46, pp. 1204-1216, 2000), where the term `network coding' was coined. In this paper, the advantage of network coding over routing, the traditional way of operating a network, was pointed out for the first time by means of a very simple example known as the butterfly network.

Applications

Network coding is seen to be useful in the following areas:

Alternative to Forward error correction and ARQ in traditional networks.
Robust and resilient to network attacks like snooping, eavesdropping or replay attacks.
Digital file distribution and P2P file sharing. eg: Avalanche from Microsoft
Bidirectional low energy transmission in wireless sensor networks.
Many-to-Many broadcast network capacity augmentation.

External links

Network Coding Homepage
Raymond W. Yeung et al, Network Coding Theory, now Publishers, 2005, http://iest2.ie.cuhk.edu.hk/~whyeung/netcode/monograph.html
Christina Fragouli et al, Network Coding: An Instant Primer, ACM SIGCOMM 2006, http://infoscience.epfl.ch/getfile.py?mode=best&recid=58339.
Avalanche Filesystem, http://research.microsoft.com/~pablo/avalanche.aspx
Random Network Coding, http://www.mit.edu/~medard/coding1.htm
Digital Fountain Codes, http://www.icsi.berkeley.edu/~luby/