Coding theory: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 18:21, 8 October 2021 edit Nick Levine (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 15,436 edits m Reverted 1 edit by 2620:9:6000:5800:4166:4FD3:5E02:FE82 (talk) to last revision by Pachu Kannan Tags: Twinkle Undo ← Previous edit		Latest revision as of 08:44, 19 June 2025 edit undo 4094uo (talk \| contribs) 26 edits m Slightly changed wording Tags: Visual edit Mobile edit Mobile web edit Advanced mobile edit
(42 intermediate revisions by 34 users not shown)
Line 1: {{Short description\|Study of the properties of codes and their fitness}} [[File:Hamming.jpg\|thumb\|A two-dimensional visualisation of the [[Hamming distance]], a critical measure in coding theory.]] '''Coding theory''' is the study of the properties of [[code]]s and their respective fitness for specific applications. Codes are used for [[data compression]], [[cryptography]], [[error detection and correction]], [[data transmission]] and [[computer data storage\|data storage]]. Codes are studied by various scientific disciplines—such as [[information theory]], [[electrical engineering]], [[mathematics]], [[linguistics]], and [[computer science]]—for the purpose of designing efficient and reliable [[data transmission]] methods. This typically involves the removal of redundancy and the correction or detection of errors in the transmitted data. There are four types of coding:<ref>{{cite book Line 9: \|date=2002 \|page=18 \|publisher=John Wiley & Sons \|section=2.4.4 Types of Coding \|quote=There are four types of coding\|isbn=9780471808725 Line 19 ⟶ 20: # [[Line code\|Line coding]] Data compression attempts to remove unwanted redundancy from the data from a source in order to transmit it more efficiently. For example, [[~~Zip (file format)\|ZIP~~DEFLATE]] data compression]] makes ~~data~~ files smaller, for purposes such as to reduce Internet traffic. Data compression and error correction may be [[Joint source and channel coding\|studied in combination]]. [[Error ~~detection~~correction ~~and~~adds ~~correction\|Error~~useful ~~correction~~[[Redundancy (information theory)\|redundancy]] ~~adds~~to ~~extra~~the data ~~bits~~from a source to make the transmission ~~of data~~ more robust to disturbances present on the transmission channel. The ordinary user may not be aware of many applications using error correction. A typical [[Compact Disc Digital Audio\|music compact disc]] (CD) uses the [[Reed–Solomon code]] to correct for scratches and dust. In this application the transmission channel is the CD itself. Cell phones also use coding techniques to correct for the [[fading]] and noise of high frequency radio transmission. Data modems, telephone transmissions, and the [[NASA Deep Space Network]] all employ channel coding techniques to get the bits through, for example the [[turbo code]] and [[LDPC code]]s.<!--Kvng RTH--> ==History of coding theory== {{excerpt\|History of information theory}} In 1948, [[Claude Shannon]] published "[[A Mathematical Theory of Communication]]", an article in two parts in the July and October issues of the ''Bell System Technical Journal''. This work focuses on the problem of how best to encode the [[information]] a sender wants to transmit. In this fundamental work he used tools in probability theory, developed by [[Norbert Wiener]], which were in their nascent stages of being applied to communication theory at that time. Shannon developed [[information entropy]] as a measure for the uncertainty in a message while essentially inventing the field of [[information theory]].▼ ▲InShannon’s ~~1948, [[Claude Shannon]] published "[[A Mathematical Theory of Communication]]", an article in two parts in the July and October issues of the ''Bell System Technical Journal''. This work~~paper focuses on the problem of how to best to encode the [[information]] a sender wants to transmit. In this fundamental work he used tools in probability theory, developed by [[Norbert Wiener]], which were in their nascent stages of being applied to communication theory at that time. Shannon developed [[information entropy]] as a measure for the uncertainty in a message while essentially inventing the field of [[information theory]]. The [[binary Golay code]] was developed in 1949. It is an error-correcting code capable of correcting up to three errors in each 24-bit word, and detecting a fourth. Line 65 ⟶ 68: # <math>C:\mathcal{X}\to\Sigma^</math> is [[Variable-length code#Non-singular codes\|non-singular]] if [[Injective function\|injective]]. # <math>C:\mathcal{X}^\to\Sigma^</math> is [[Uniquely decodable code#Uniquely decodable codes\|uniquely decodable]] if injective. # <math>C:\mathcal{X}\to\Sigma^</math> is [[Variable-length code#Prefix codes\|instantaneous]] if <math>C(x_1)</math> is not a proper prefix of <math>C(x_2)</math> (and vice versa). ===Principle=== Line 80 ⟶ 83: {{main\|Error detection and correction}} The purpose of channel coding theory is to find codes which transmit quickly, contain many valid [[Code word (communication)\|code word]]s and can correct or at least [[error detection\|detect]] many errors. While not mutually exclusive, performance in these areas is a trade -off. So, different codes are optimal for different applications. The needed properties of this code mainly depend on the probability of errors happening during transmission. In a typical CD, the impairment is mainly dust or scratches. CDs use [[cross-interleaved Reed–Solomon coding]] to spread the data out over the disk.<ref> Line 87 ⟶ 90: </ref> Although not a very good code, a simple repeat code can serve as an understandable example. Suppose we take a block of data bits (representing sound) and send it three times. At the receiver we will examine the three repetitions bit by bit and take a majority vote. The twist on this is that we ~~don't~~do not merely send the bits in order. We interleave them. The block of data bits is first divided into 4 smaller blocks. Then we cycle through the block and send one bit from the first, then the second, etc. This is done three times to spread the data out over the surface of the disk. In the context of the simple repeat code, this may not appear effective. However, there are more powerful codes known which are very effective at correcting the "burst" error of a scratch or a dust spot when this interleaving technique is used. Other codes are more appropriate for different applications. Deep space communications are limited by the [[thermal noise]] of the receiver which is more of a continuous nature than a bursty nature. Likewise, narrowband modems are limited by the noise, present in the telephone network and also modeled better as a continuous disturbance.{{Citation needed\|date=July 2008}} Cell phones are subject to rapid [[fading]]. The high frequencies used can cause rapid fading of the signal even if the receiver is moved a few inches. Again there are a class of channel codes that are designed to combat fading.{{Citation needed\|date=July 2008}} Line 127 ⟶ 130: # [[Reed–Muller code]]s # [[Hamming bound\|Perfect codes]] # [[Locally recoverable code]] Block codes are tied to the [[sphere packing]] problem, which has received some attention over the years. In two dimensions, it is easy to visualize. Take a bunch of pennies flat on the table and push them together. The result is a hexagon pattern like a bee's nest. But block codes rely on more dimensions which cannot easily be visualized. The powerful (24,12) [[Binary Golay code\|Golay code]] used in deep space communications uses 24 dimensions. If used as a binary code (which it usually is) the dimensions refer to the length of the codeword as defined above. The theory of coding uses the ''N''-dimensional sphere model. For example, how many pennies can be packed into a circle on a tabletop, or in 3 dimensions, how many marbles can be packed into a globe. Other considerations enter the choice of a code. For example, hexagon packing into the constraint of a rectangular box will leave empty space at the corners. As the dimensions get larger, the percentage of empty space grows smaller. But at certain dimensions, the packing uses all the space and these codes are the so-called "perfect" codes. The only nontrivial and useful perfect codes are the distance-3 Hamming codes with parameters satisfying (2<sup>''r''</sup> – 1, 2<sup>''r''</sup> – 1 – ''r'', 3), and the [23,12,7] binary and [11,6,5] ternary Golay codes.<ref name=terras> {{cite book \| title = Fourier Analysis on Finite Groups and Applications \|first=Audrey \|last=Terras \|author-link=Audrey Terras\| publisher = [[Cambridge University Press]] \| year = 1999 \| isbn = 978-0-521-45718-7 \| url = https://archive.org/details/fourieranalysiso0000terr \| url-access = registration \| page = [https://archive.org/details/fourieranalysiso0000terr/page/195 195] }}</ref><ref name=blahut>{{cite book \|title = Algebraic Codes for Data Transmission \|first=Richard E. \|last=Blahut \|author-link=Richard E. Blahut \| publisher = Cambridge University Press \| year = 2003 \| isbn = 978-0-521-55374-2 \| url = https://books.google.com/books?id=n0XHMY58tL8C&pg=PA60}} </ref> Line 145 ⟶ 149: Again, consider pennies as an example. First we pack the pennies in a rectangular grid. Each penny will have 4 near neighbors (and 4 at the corners which are farther away). In a hexagon, each penny will have 6 near neighbors. When we increase the dimensions, the number of near neighbors increases very rapidly. The result is the number of ways for noise to make the receiver choose a neighbor (hence an error) grows as well. This is a fundamental limitation of block codes, and indeed all codes. It may be harder to cause an error to a single neighbor, but the number of neighbors can be large enough so the total error probability actually suffers.<ref name=schlegel/> Properties of linear block codes are used in many applications. For example, the syndrome-coset uniqueness property of linear block codes is used in trellis shaping,<ref>{{cite journal \|first=G.D., Jr. \|last=Forney \|author-link=Dave Forney \|title=Trellis shaping \|journal=IEEE Transactions on Information Theory \|volume=38 \|issue=2 Pt 2 \|pages=281–300 \|date=March 1992 \|doi=10.1109/18.119687 \|s2cid=37984132 }}</ref> one of the best-known [[shaping codes]]. ====Convolutional codes==== Line 162 ⟶ 166: ==Cryptographic coding== {{main\|Cryptography}} [[Cryptography]] or cryptographic coding is the practice and study of techniques for [[secure communication]] in the presence of third parties (called [[adversary (cryptography)\|adversaries]]).<ref name="rivest90">{{cite book\|first=Ronald L.\|last=Rivest\|author-link=Ron Rivest\|editor=J. Van Leeuwen\|title=Handbook of Theoretical Computer Science\|chapter=Cryptology\|volume=1\|publisher=Elsevier\|year=1990}}</ref> More generally, it is about constructing and analyzing [[communications protocol\|protocol]]s that block adversaries;<ref name="modern-crypto">{{Cite book\|first1=Mihir\|last1=Bellare\|first2=Phillip\|last2=Rogaway\|title=Introduction to Modern Cryptography\|chapter=Introduction\|page=10\|date=21 September 2005}}</ref> various aspects in [[information security]] such as data [[confidentiality]], [[data integrity]], [[authentication]], and [[non-repudiation]]<ref name="hac">{{cite book \|first1=A. J. \|last1=Menezes \|first2=P. C. \|last2=van Oorschot \|first3=S. A. \|last3=Vanstone \|url=https://archive.org/details/handbookofapplie0000mene \|title=Handbook of Applied Cryptography \|isbn=978-0-8493-8523-0 \|year=1997 \|publisher=Taylor & Francis \|url-access=registration }}</ref> are central to modern cryptography. Modern cryptography exists at the intersection of the disciplines of [[mathematics]], [[computer science]], and [[electrical engineering]]. Applications of cryptography include [[automated teller machine\|ATM cards]], [[password\|computer passwords]], and [[electronic commerce]]. Cryptography prior to the modern age was effectively synonymous with ''[[encryption]]'', the conversion of information from a readable state to apparent [[nonsense]]. The originator of an encrypted message shared the decoding technique needed to recover the original information only with intended recipients, thereby precluding unwanted persons from doing the same. Since [[World War I]] and the advent of the [[computer]], the methods used to carry out cryptology have become increasingly complex and its application more widespread. Line 171 ⟶ 175: {{main\|Line code}} A [[line code]] (also called digital baseband modulation or digital [[baseband]] transmission method) is a [[code]] chosen for use within a [[communications system]] for [[baseband]] [[transmission (telecommunications)\|transmission]] purposes~~. Line coding is often used for digital data transport~~. Line coding is often used for digital data transport. It consists of representing the [[Digital signal (electronics)\|digital signal]] to be transported by an amplitude- and time-discrete signal that is optimally tuned for the specific properties of the physical channel (and of the receiving equipment). The [[waveform]] pattern of voltage or current used to represent the 1s and 0s of a digital data on a transmission link is called ''line encoding''. The common types of line encoding are [[Unipolar encoding\|unipolar]], [[Polar encoding\|polar]], [[Bipolar encoding\|bipolar]], and [[Manchester encoding]]. ==Other applications of coding theory== Line 182 ⟶ 186: Another application of codes, used in some mobile phone systems, is [[code-division multiple access]] (CDMA). Each phone is assigned a code sequence that is approximately uncorrelated with the codes of other phones.{{Citation needed\|date=July 2008}} When transmitting, the code word is used to modulate the data bits representing the voice message. At the receiver, a demodulation process is performed to recover the data. The properties of this class of codes allow many users (with different codes) to use the same radio channel at the same time. To the receiver, the signals of other users will appear to the demodulator only as a low-level noise.{{Citation needed\|date=July 2008}} Another general class of codes are the [[automatic repeat-request]] (ARQ) codes. In these codes the sender adds redundancy to each message for error checking, usually by adding check bits. If the check bits are not consistent with the rest of the message when it arrives, the receiver will ask the sender to retransmit the message. All but the simplest [[wide area network]] protocols use ARQ. Common protocols include [[Synchronous Data Link Control\|SDLC]] (IBM), [[Transmission Control Protocol\|TCP]] (Internet), [[X.25]] (International) and many others. There is an extensive field of research on this topic because of the problem of matching a rejected packet against a new packet. Is it a new one or is it a retransmission? Typically numbering schemes are used, as in TCP.{{cite ~~web~~journal \|url= http://tools.ietf.org/html/rfc793 \|title= RFC793 \|~~work~~journal= ~~RFCs~~RFCS\|publisher= [[Internet Engineering Task Force]] (IETF) \|date= September 1981}} ===Group testing=== Line 192 ⟶ 196: {{cite conference \| title = On Analog Signature Analysis \| citeseerx=10.1.1.142.5853 \| first1 = Franc \| last1 = Novak \| first2 = Bojan \| last2 = Hvala \| first3 = Sandi \| last3 = Klavžar \| book-title = Proceedings of the conference on Design, automation and test in Europe \| year = 1999 \| isbn = 1-58113-121-6 }} </ref> and analog encryption.<ref> {{cite journal \|author1=Shujun Li \|author2=Chengqing Li \|author3=Kwok-Tung Lo \|author4=Guanrong Chen \|title=Cryptanalyzing an Encryption Scheme Based on Blind Source Separation \|journal=IEEE Transactions on Circuits and Systems I \|volume=55 \|issue=4 \|pages=1055–63 \|date=April 2008 \|doi=10.1109/TCSI.2008.916540 \|arxiv=cs/0608024 \|s2cid=2224947 \|url=http://epubs.surrey.ac.uk/532452/1/IEEETCASI2008.pdf }} </ref> ==Neural coding== [[Neural coding]] is a [[neuroscience]]-related field concerned with how sensory and other information is represented in the [[brain]] by [[neural network\|networks]] of [[neurons]]. The main goal of studying neural coding is to characterize the relationship between the [[Stimulus (physiology)\|stimulus]] and the individual or ensemble neuronal responses and the relationship among electrical activity of the neurons in the ensemble.<ref name="Brown">{{cite journal \|vauthors=Brown EN, Kass RE, Mitra PP \|title=Multiple neural spike train data analysis: state-of-the-art and future challenges \|journal=Nature Neuroscience \|volume=7 \|issue=5 \|pages=456–461 \|date=May 2004 \| url = http://www.stat.columbia.edu/~liam//teaching/neurostat-fall13/papers/brown-et-al/brown-kass-mitra.pdf \|pmid=15114358 \|doi=10.1038/nn1228 \|s2cid=562815 }}</ref> It is thought that neurons can encode both [[Digital data\|digital]] and [[analog signal\|analog]] information,<ref>{{cite book \|first=S.J. \|last=Thorpe \|chapter=Spike arrival times: A highly efficient coding scheme for neural networks \|chapter-url=http://pop.cerco.ups-tlse.fr/fr_vers/documents/thorpe_sj_90_91.pdf \|format=PDF \|pages=91–94 \|editor1-first=R. \|editor1-last=Eckmiller \|editor2-first=G. \|editor2-last=Hartmann \|editor3-first=G. \|editor3-last=Hauske \| editor3-link= Gert Hauske \|title=Parallel processing in neural systems and computers \|url=https://books.google.com/books?id=b9gmAAAAMAAJ \|access-date=30 June 2013 \|year=1990 \|publisher=North-Holland \|isbn=978-0-444-88390-2}}</ref> and that neurons follow the principles of information theory and compress information,<ref>{{cite journal \|first1=T. \|last1=Gedeon \|first2=A.E. \|last2=Parker \|first3=A.G. \|last3=Dimitrov \|title=Information Distortion and Neural Coding \|journal=Canadian Applied Mathematics Quarterly \|volume=10 \|issue=1 \|pages=10 \|date=Spring 2002 \|url=http://www.math.ualberta.ca/ami/CAMQ/table_of_content/vol_10/10_1c.htm \|citeseerx=10.1.1.5.6365 \|access-date=2013-06-30 \|archive-date=2016-11-17 \|archive-url=https://web.archive.org/web/20161117220131/http://www.math.ualberta.ca/ami/CAMQ/table_of_content/vol_10/10_1c.htm \|url-status=dead }}</ref> and detect and correct<ref> {{cite journal \|first=M. \|last=Stiber \|title=Spike timing precision and neural error correction: local behavior \|journal=Neural Computation \|volume=17 \|issue=7 \|pages=1577–1601 \|date=July 2005 \|doi=10.1162/0899766053723069 \|pmid=15901408 \|arxiv=q-bio/0501021\|s2cid=2064645 }} </ref> errors in the signals that are sent throughout the brain and wider nervous system. Line 223 ⟶ 227: ==References== * [[Elwyn R. Berlekamp]] (2014), ''Algebraic Coding Theory'', World Scientific Publishing (revised edition), {{isbn\|978-9-81463-589-9}}. * [[David J. C. MacKay\|MacKay, David J. C.]]. ''[https://web.archive.org/web/20160217105359/http://www.inference.phy.cam.ac.uk/mackay/itila/book.html Information Theory, Inference, and Learning Algorithms]'' Cambridge: Cambridge University Press, 2003. {{isbn\|0-521-64298-1}} * [[Vera Pless]] (1982), ''[[Introduction to the Theory of Error-Correcting Codes]]'', John Wiley & Sons, Inc., {{isbn\|0-471-08684-3}}. * Randy Yates, ''[https://web.archive.org/web/20110710143034/http://www.digitalsignallabs.com/tutorial.pdf A Coding Theory Tutorial]''.