Sparse distributed memory: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 13:34, 3 November 2011 edit OneThousandTwentyFour (talk \| contribs) 141 edits No edit summary ← Previous edit		Latest revision as of 03:30, 11 August 2025 edit undo Bender the Bot (talk \| contribs) Bots 1,064,377 edits m →Implementation: HTTP to HTTPS for SourceForge Tag: AWB
(388 intermediate revisions by 96 users not shown)
Line 1: {{Short description\|Mathematical model of memory}} '''Sparse Distributed Memory''' is a mathematical model of human long-term memory first introduced by [[Pentti Kanerva]] in 1988. It is used for storing and retrieving large amounts of information without focusing on the accuracy of the information. It uses patterns to serve as [[Memory address\|memory addresses]], where information is retrieved based on similarities between addresses. Memory addresses are all in a list even if they are not related, but are only retrieved based on similar content between themselves. '''Sparse distributed memory''' ('''SDM''') is a mathematical model of human [[long-term memory]] introduced by [[Pentti Kanerva]] in 1988 while he was at [[Ames Research Center\|NASA Ames Research Center]].<ref name="book" /> This memory exhibits behaviors, both in theory and in experiment, that resemble those previously unapproached by machines – e.g., rapid recognition of faces or odors, discovery of new connections between seemingly unrelated ideas, etc. Sparse distributed memory is used for storing and retrieving large amounts (<math>2^{1000}</math> [[Bit\|bits]]) of information without focusing on the accuracy but on similarity of information.<ref name="book2">{{cite book \|last=Kanerva \|first=Pentti \|title=Sparse Distributed Memory \|publisher=The MIT Press \|year=1988 \|isbn=978-0-262-11132-4}}</ref> There are some recent applications in robot navigation<ref>{{cite book \|last1=Mendes \|first1=Mateus \|title=2008 IEEE International Conference on Robotics and Automation \|last2=Crisostomo \|first2=Manuel \|last3=Coimbra \|first3=A. Paulo \|year=2008 \|isbn=978-1-4244-1646-2 \|pages=53–58 \|chapter=Robot navigation using a sparse distributed memory \|doi=10.1109/ROBOT.2008.4543186 \|s2cid=10977460}}</ref> and experience-based robot manipulation.<ref>{{cite book \|last1=Jockel \|first1=S. \|title=2008 IEEE International Conference on Robotics and Biomimetics \|last2=Lindner \|first2=F. \|last3=Jianwei Zhang \|year=2009 \|isbn=978-1-4244-2678-2 \|pages=1298–1303 \|chapter=Sparse distributed memory for experience-based robot manipulation \|doi=10.1109/ROBIO.2009.4913187 \|s2cid=16650992}}</ref> ~~==Formula==~~ ~~The general formula is <math>2^n</math> where ''n'' is the number of dimensions of the space, and <math>2^n</math> is the number of feasible memory items.~~ == General principle == It is a generalized [[random-access memory]] (RAM) for long (e.g., 1,000 bit) binary words. These words serve as both addresses to and data for the memory. The main attribute of the memory is sensitivity to similarity. This means that a word can be read back not only by giving the original write address but also by giving one close to it, as measured by the number of mismatched bits (i.e., the [[Hamming distance]] between [[memory address]]es).<ref name="book">{{cite book\|last=Kanerva\|first=Pentti\|title=Sparse Distributed Memory\|year=1988\|publisher=The MIT Press\|isbn=978-0-262-11132-4}}</ref> SDM implements transformation from logical space to physical space using distributed data representation and storage, similarly to [[Encoding (memory)\|encoding]] processes in human memory.<ref>{{cite journal \| last1 = Rissman \| first1 = Jesse \| last2 = Wagner \| first2 = Anthony D. \| year = 2012 \| title = Distributed representations in memory: insights from functional brain imaging \| journal = Annual Review of Psychology \| volume = 63 \| pages = 101–28 \| doi=10.1146/annurev-psych-120710-100344\| pmc = 4533899 \| pmid=21943171}}</ref> A value corresponding to a logical address is stored into many physical addresses. This way of storing is robust and not deterministic. A memory cell is not addressed directly. If input data (logical addresses) are partially damaged at all, we can still get correct output data.<ref name="greb2"/> The theory of the memory is mathematically complete<ref name="book"/> and has been verified by [[computer simulation]]. It arose from the observation that the distances between points of a [[high-dimensional space]] resemble the proximity relations between [[concept]]s in human memory. The theory is also practical in that memories based on it can be implemented with conventional [[random-access memory]] elements.<ref name="flynn89">Flynn, Michael J., Pentti Kanerva, and Neil Bhadkamkar. "Sparse distributed memory prototype: principles and operation." (1989).</ref> ==Definition== ~~Sparse~~Human ~~Distributed~~memory ~~Memory is~~has a ~~mathematical~~tendency ~~representation~~to of[[Multiple ~~human~~trace ~~memory~~theory\|congregate memories]] based on similarities between them (although they may not be related), such as "firetrucks are red and ~~since~~apples ~~human~~are red".<ref name=ship>{{cite web\|title=General Psychology\|url=http://webspace.ship.edu/cgboer/memory.html\|publisher=Shippensburg isUniversity\|author=C. ~~complex,~~George Boeree\|year=2002}}</ref> Sparse ~~Distributed~~distributed ~~Memory~~memory is a mathematical representation of human memory, and uses [[Clustering high-dimensional data\|high-dimensional space]] to help model the large amounts of memory that mimics that of the human neural network.<ref name=psu>{{cite ~~web~~journal\|title=Sparse Distributed Memory and Related Models\|~~url~~pages=~~http://~~50–76\|citeseerx~~.ist.psu.edu/viewdoc/download?doi~~=10.1.1.2.8403~~&rep=rep1&type=pdf~~\|publisher=Pennsylvania State University~~\|accessdate=3 November 2011~~\|author=Pentti Kanerva\|year=1993}}</ref> ~~It utilizes the [[Hamming distance]] to measure mismatched bits and read back data between the original write address and one near it.~~<ref name=stanford>{{~~cite~~Cite FTP ~~web~~\|title=Sparse Distributed Memory: Principles and Operation\|url=ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/89/400/CSL-TR-89-400.pdf\|~~publisher=Stanford University\|accessdate~~access-date=1 November 2011\|~~coauthors~~author1=M. J. Flynn, \|author2=P. Kanerva~~, and~~ \|author3=N. Bhadkamkar\|~~date=December 1989}}</ref> Human memory has a tendency to congregate memories based on similarities between them(although they may not be related), such as firetrucks are red and apples are red.<ref~~ name-list-style=~~ship>{{cite web~~amp\|~~title~~server=~~General~~Stanford ~~Psychology~~University\|url-status=~~http://webspace.ship.edu/cgboer/memory.html~~dead\|~~publisher~~date=~~Shippensburg~~December ~~University\|author=C. George Boeree\|year=2002~~1989}}</ref> An important property of such high dimensional spaces is that two randomly chosen vectors are relatively far away from each other, meaning that they are uncorrelated.<ref name=integerSDM/> SDM can be considered a realization of [[locality-sensitive hashing]]. The underlying idea behind a SDM is the mapping of a huge binary memory onto a smaller set of physical locations, so-called ''hard locations''. As a general guideline, those hard locations should be uniformly distributed in the [[virtual memory\|virtual space]], to mimic the existence of the larger virtual space as accurately as possible. Every datum is stored distributed by a set of hard locations, and retrieved by averaging those locations. Therefore, recall may not be perfect, accuracy depending on the saturation of the memory. ~~==Example==~~ ~~Sparse Distributed Memory is based off of pulling in patterns between different addresses.~~ Kanerva's proposal is based on four basic ideas:<ref>Mendes, Mateus Daniel Almeida. "Intelligent robot navigation using a sparse distributed memory." Phd thesis, (2010). URL: https://eg.sib.uc.pt/handle/10316/17781 {{Webarchive\|url=https://web.archive.org/web/20160304073500/https://eg.sib.uc.pt/handle/10316/17781 \|date=2016-03-04 }}</ref> ~~Imagine each line as a different memory address, an example from Manerva's book:~~ ~~:"Why are fire engines painted red?~~ ~~:Firemen's suspenders are red, too.~~ ~~:Two and two are four.~~ ~~:Four times three is twelve.~~ ~~:Twelve inches in a foot.~~ ~~:A foot is a ruler.~~ ~~:Queen Mary is a ruler.~~ ~~:Queen Mary sailed the sea.~~ ~~:The sea has sharks.~~ ~~:Sharks have fins.~~ ~~:The Russians conquered the Finns.~~ ~~:The Russians' color is red.~~ ~~:Fire engines are always rushin'.~~ ~~:So that's why they're painted red!"<ref name=test>{{cite book\|last=Kanerva\|first=Pentti\|title=Sparse Distributed Memory\|year=1988\|publisher=The MIT Press\|isbn=978-0262111324}}</ref>~~ # The boolean space <math> \{0,1\}^n</math>, or <math>2^n</math> points in <math>10^0 < n < 10^5</math> dimensions, exhibits properties which are similar to humans' intuitive notions of relationships between the concepts. This means that it makes sense to store data as points of the mentioned space where each memory item is stored as an n-bit vector. ~~==Uses==~~ # Neurons with n inputs can be used as address decoders of a random-access memory At the University of Memphis, Uma Ramamurthy, Sidney K. D’Mello, and Stan Franklin created a modified version of the Sparse Distributed Memory system that mathematically represents "realizing forgetting." It uses a decay equation to better show interference in data. The Sparse Distributed Memory system distributes each pattern into approximately one hundredth of the locations, so interference can have detrimental results. # Unifying principle: data stored into the memory can be used as addresses to the same memory. Distance between two points is a measure of similarity between two memory items. The closer the points, the more similar the stored vectors. # Time can be traced in the memory as a function of where the data are stored, if the data are organized as sequences of events. ===The binary space N === ~~Two possible examples of decay from this modified Sparse Distributed Memory are presented~~ {{further\|Vector space model}} The SDM works with n-dimensional vectors with binary components. Depending on the context, the vectors are called points, patterns, addresses, words, memory items, data, or events. This section is mostly about the properties of the vector space N = <math>\{0,1\}^n</math>. Let n be number of dimensions of the space. The number of points, or possible memory items, is then <math>2^n</math>. We will denote this number by N and will use N and <math>2^n</math> to stand also for the space itself.<ref name="greb2">Grebeníček, František. "Sparse Distributed Memory− Pattern Data Analysis. URL: http://www.fit.vutbr.cz/~grebenic/Publikace/mosis2000.pdf"</ref> '''Concepts Related to the space N: <math>\{0,1\}^n</math>'''<ref name=greb2/> ''Origin'', 0: The point with all coordinates 0 is called the origin, 0 = 000...00. ~~'''Exponential Decay Mechanism''': <math>\!f(x)=1+e^{-ax}</math>~~ ''Complement'', 'x: The complement, or opposite, of point x is the n-tuple that has ones where x has zeros and vice versa. ''Norm'', \|x\|: The norm of point x is the number of ones in its binary representation. ''Difference'', x − y: The difference of two points x and y is the n-tuple that has ones where x and y differ and zeros elsewhere. It is the bitwise '[[exclusive or]]': x − y = x ⊕ y. The difference commutes: x − y = y − x. ''Distance'', d(x, y) The distance between two points x and y is the number of dimensions at which x and y differ. It is called the Hamming distance (its square root is the [[Euclidean distance]]) and is expressed in bits. Distance is the norm of the difference: d(x, y) = \|x − y\| ''Betweenness'', x:y:z: Point y is between points x and z if and only if the distance from x to z is the sum of the distances from x to y and from y to z; that is, x:y:z ⇔ d(x, z) = d(x, y) + d(y, z). It is easily seen that every bit of a point in between is a copy of the corresponding bit of an endpoint. ''Orthogonality'', x ⊥ y: Point x is orthogonal to point y, or the two are perpendicular or indifferent, if and only if the distance between the two is half the number of dimensions: x ⊥ y ⇔ d(x, y) = n/2. The distance n/2 is called indifference distance of space N. If x is orthogonal to y, it is also orthogonal to its complement 'y (x is halfway between y and 'y). ''Circle'', O(r,x) A circle with radius r and center x is the set of points that are at most r bits from x: O(r,x) = {y \| d(x, y) ≤ r}. '''Properties of the space N: <math>\{0,1\}^n</math>'''<ref name=book/><ref name=greb2/> ~~'''Negated-Translated Sigmoid Decay Mechanism''': <math>f(x)=1-[\frac{1}{1+e^{-a(x-c)}}]</math>~~ <ref name=memphis>{{cite web\|title=Realizing Forgetting in a Modified Sparse Distributed Memory System\|url=http://csjarchive.cogsci.rpi.edu/proceedings/2006/docs/p1992.pdf\|work=Computer Science Department & The Institute for Intelligent Systems\|publisher=The University of Memphis\|accessdate=1 November 2011\|coauthors=Uma Ramamurthy, Sidney K. D’Mello, Stan Franklin\|archiveurl=http://csjarchive.cogsci.rpi.edu/proceedings/2006/\|archivedate=2006\|page=1992\|pages=1992-1997}}</ref> The space N can be represented by the vertices of the unit cube in n-dimensional [[Euclidean space]]. The vertices lie on the surface of an n-dimensional sphere with (Euclidean-metric) radius <math>\sqrt{n}/2 </math>. This gives rise to the [[sphere]] analogy. We will call a space spherical if # any point x has a unique opposite 'x, # the entire space is between any point x and its opposite 'x, and # all points are "equal" (meaning that for any two points x and y there is a distance preserving [[automorphism]] of the space that maps x to y, so that from any of its points the space "looks" the same). The surface of a sphere (in Euclidean 3d-space) clearly is spherical. According to definition, N is also spherical, since y ⊕ x ⊕ (…) is an automorphism that maps x to y. Because N is spherical, it is helpful to think of it as the surface of a sphere with [[circumference]] 2n. All points of N are equally qualified as points of origin, and a point and its complement are like two poles at distance n from each other, with the entire space in between. The points halfway between the poles and perpendicular to them are like the equator. ;Distribution of the space N The number of points that are exactly d bits from an arbitrary point x (say, from the point 0) is the number of ways to choose d coordinates from a total of n coordinates, and is therefore given by the [[binomial coefficient]]: <math> \binom nd = \frac{n!}{d!(n-d)!} </math> The distribution of N thus is the binomial distribution with parameters n and p, where p = 1/2. The mean of the [[binomial distribution]] is n/2, and the [[variance]] is n/4. This distribution function will be denoted by N(d). The [[normal distribution]] F with mean n/2 and [[standard deviation]] <math>\sqrt{n}/2 </math> is a good approximation to it: N(d) = Pr{d(x, y) ≤ d} ≅ F{(d − n / 2)/ <math>\sqrt{n/4} </math>} ;Tendency to orthogonality An outstanding property of N is that most of it lies at approximately the mean (indifference) distance n/2 from a point (and its complement). In other words, most of the space is nearly orthogonal to any given point, and the larger n is, the more pronounced is this effect. ===As neural network=== The SDM may be regarded either as a [[content-addressable memory\|content-addressable]] extension of a classical [[random-access memory]] (RAM) or as a special type of three layer [[feedforward neural network]]. The main SDM alterations to the RAM are:<ref name="Grebenıcek">Grebenıcek, František. Neural Nets as Associative Memories. Diss. Brno University of Technology, 2001. URL: http://www.vutium.vutbr.cz/tituly/pdf/ukazka/80-214-1914-8.pdf {{Webarchive\|url=https://web.archive.org/web/20160304132149/http://www.vutium.vutbr.cz/tituly/pdf/ukazka/80-214-1914-8.pdf \|date=2016-03-04 }}</ref> The SDM calculates [[Hamming distance]]s between the reference address and each ___location address. For each distance which is less or equal to the given radius the corresponding ___location is selected. The memory is represented by counters (where n is number of locations and m is the input data length) instead of single-bit storage elements. Writing to the memory, instead of overwriting, is as follows: if the i-bit of the input data is 1, the corresponding counters (counters in the selected locations (rows) and in the i-th columns) are incremented, if the i-bit of the input data is 0, the corresponding counters are decremented. Reading (or recall) from the memory is similar: The contents of the selected locations are summed columnwise. Each sum is thresholded. If the sum is greater than or equal to the threshold value the corresponding output bit is set to 1, in the opposite case it is cleared. Note that the thresholds may be zero, if the training input vectors are closed to orthogonal ones. ====Neuron model==== An idealized description of [[neuron]] is as follows: a neuron has a cell body with two kinds of branches: ''[[dendrites]]'' and an ''[[axon]]''. It receives input signals from other neurons via dendrites, integrates (sums) them and generates its own (electric) output signal which is sent to outside neurons via axon. The points of electric contact between neurons are called ''[[synapses]]''. When a neuron generates signal it is ''firing'' and after firing it must ''recover'' before it fires again. The relative importance of a synapse to the firing of neuron is called ''synaptic weight'' (or ''input coefficient''). There are two kinds of synapses: [[excitatory]] that trigger neuron to ''fire'' and [[inhibitory]] that hinder firing. The neuron is either excitatory or inhibitory according to the kinds of synapses its axon makes.<ref>Kandel, Eric R., James H. Schwartz, and Thomas M. Jessell, eds. Principles of neural science. Vol. 4. New York: McGraw-Hill, 2000.</ref> A neuron fires when the sum of inputs exceed a specific ''threshold''. The higher the threshold the more important it is that excitatory synapses have input while inhibitory ones do not.<ref>Eccles, John G. [https://link.springer.com/chapter/10.1007/978-1-4684-6817-5_9 "Under the Spell of the Synapse."] The Neurosciences: Paths of Discovery, I. Birkhäuser Boston, 1992. 159-179.</ref> Whether a recovered neuron actually fires depends on whether it received sufficient excitatory input (beyond the threshold) and not too much of inhibitory input within a certain period. The formal model of neuron makes further simplifying assumptions.<ref>{{cite journal\|doi=10.1007/bf02478259\|title=A logical calculus of the ideas immanent in nervous activity\|journal=Bulletin of Mathematical Biophysics\|volume=5\|issue=4\|pages=115–133\|year=1943\|last1=McCulloch\|first1=Warren S.\|last2=Pitts\|first2=Walter}}</ref> An ''n''-input neuron is modeled by a ''linear threshold function'' <math>F: \{0,1\}^n -> \{0,1\}</math> as follows : For <math>i = 0,...,n-1</math> where n is the number of inputs, let <math>F_t</math> be the output at time ''t'': <math>F_t\in\{0,1\}</math>, and let <math>x_{i,t}</math> be the ''i''-th input at time ''t'': <math>x_{i,t}\in\{0,1\}</math>. Let <math>w_i</math> be the weight of the ''i''-th input and let <math>c</math> be the threshold. The ''weighted sum'' of the inputs at time ''t'' is defined by <math>S_t=\sum_{i=0}^{n-1} w_ix_{i,t}</math> The ''neuron output'' at time ''t'' is then defined as a [[boolean function]]: <math>\mathbf F_t = \begin{cases} 1 &\text{if } S_t >= c, \\ 0 &\text{otherwise }. \end{cases} </math> Where F<sub>t</sub>=1 means that the neuron fires at time ''t'' and F<sub>t</sub>=0 that it doesn't, i.e. in order for neuron to fire the weighted sum must reach or exceed the threshold . Excitatory inputs increase the sum and inhibitory inputs decrease it. ====Neuron as address-decoder==== Kanerva's key thesis<ref name="book"/> is that certain neurons could have their input coefficients and thresholds fixed over the entire life of an organism and used as address decoders where ''n''-tuple of input coefficients (the pattern to which neurons respond most readily) determines the ''n''-bit memory address, and the threshold controls the size of the region of similar address patterns to which the neuron responds. This mechanism is complementary to adjustable synapses or adjustable weights in a neural network ([[perceptron]] convergence learning), as this fixed accessing mechanism would be a permanent ''frame of reference'' which allows to ''select'' the synapses in which the information is stored and from which it is retrieved under given set of circumstances. Furthermore, an encoding of the present circumstance would serve as an address. The address ''a'' of a neuron with input coefficients ''w'' where <math>w_0,..,w_{n_1}</math> is defined as an ''n''-bit input pattern that maximizes the weighted sum. The maximum occurs when the inhibitory inputs are zeros and the excitatory inputs are ones. The ''i''-th bit of address is: <math>\mathbf a_i = \begin{cases} 1 &\text{if } w_i > 0, \\ 0 &\text{if } w_i < 0. \end{cases} </math> (assuming weights are non-zero) The ''maximum weighted sum'' <math>S(w)</math> is then the sum of all positive coefficients: <math>S(w)=\sum_{w_i>0} w_i</math> And the ''minimum weighted sum'' <math>s(w)</math> would correspond to a point opposite the neuron address a`: <math>s(w)=\sum_{w_i<0} w_i</math> When the threshold c is in range <math> s(w)<c<S(w)</math> the output of the neuron is 0 for some addresses (input patterns) and 1 for others. If the threshold is above S the output is always 0, if it's below s the output is always 1. So by a proper choice of the threshold a neuron responds only to just one address. When the threshold is S (the maximum for the weighted sum) the neuron responds only to its own address and acts like an address decoder of a conventional [[random-access memory]]. ===Memory ___location=== SDM is designed to cope with address patterns that span an enormous address space (order of <math>2^{1000}</math>). SDM assumes that the address patterns actually describing physical situations of interest are ''sparsely'' scattered throughout the input space. It is impossible to reserve a separate physical ___location corresponding to each possible input; SDM implements only a limited number of physical or ''hard'' locations. The physical ___location is called a memory (or ''hard'') ___location.<ref name=flynn89/> Every hard ___location has associated with it two items: * a fixed hard address, which is the N-bit address of the ___location * a contents portion that is M-bits wide and that can accumulate multiple M-bit data patterns written into the ___location. The contents' portion is not fixed; it is modified by data patterns written into the memory. In SDM a word could be stored in memory by writing it in a free storage ___location and at the same time providing the ___location with the appropriate address decoder. A neuron as an address decoder would select a ___location based on similarity of the ___location's address to the retrieval cue. Unlike conventional [[Turing machines]] SDM is taking advantage of ''[[parallel computing]] by the address decoders''. The mere ''accessing the memory'' is regarded as computing, the amount of which increases with memory size.<ref name="book"/> ====Address pattern==== An N-bit vector used in writing to and reading from the memory. The address pattern is a coded description of an environmental state. (e.g. N = 256.) ====Data pattern==== An M-bit vector that is the object of the writing and reading operations. Like the address pattern, it is a coded description of an environmental state. (e.g. M = 256.) ====Writing==== Writing is the operation of storing a data pattern into the memory using a particular address pattern. During a write, the input to the memory consists of an address pattern and a data pattern. The address pattern is used to select ''hard'' memory locations whose hard addresses are within a certain cutoff distance from the address pattern. The data pattern is stored into each of the selected locations. ====Reading==== Reading is the operation of retrieving a data pattern from the memory using a particular address pattern. During a read, an address pattern is used to select a certain number of ''hard'' memory locations (just like during a write). The contents of the selected locations are bitwise summed and thresholded to derive an M-bit data pattern. This serves as the output read from the memory. ====Pointer chains==== All of the items are linked in a single list (or array) of pointers to memory locations, and are stored in RAM. Each address in an array points to an individual line in the memory. That line is then returned if it is similar to other lines. Neurons are utilized as address decoders and encoders, similar to the way neurons work in the brain, and return items from the array that match or are similar. ===Critical distance=== Kanerva's model of memory has a concept of a ''critical point'': prior to this point, a previously stored item can be easily retrieved; but beyond this point an item cannot be retrieved. Kanerva has methodically calculated this point for a particular set of (fixed) parameters. The corresponding ''critical distance'' of a Sparse Distributed Memory can be approximately evaluated minimizing the following equation with the restriction <math>d \in N </math> and <math>d \leqslant n</math>. The proof can be found in,<ref name="msbrogli">{{Cite thesis \|author=Brogliato, Marcelo Salhab \|title=Understanding Critical Distance in Sparse Distributed Memory \|year=2012 \|hdl=10438/13095 }}</ref><ref>{{cite journal\|last1=Brogliato\|first1=Marcelo Salhab\|last2=Chada\|first2=Daniel de Magalhães\|last3=Linhares\|first3=Alexandre\|title=Sparse Distributed Memory: understanding the speed and robustness of expert memory\|journal=Frontiers in Human Neuroscience\|date=2014\|volume=8\|issue=222\|pages=222\|doi=10.3389/fnhum.2014.00222\|pmid=24808842\|pmc=4009432\|doi-access=free }}</ref> <math> \tilde{f}(d) = \left\{ \frac{1}{2} \cdot \left[ 1 - N \left( z < \frac{w \cdot shared(d)}{\sqrt{\theta}} \right) + N \left( z < \frac{- w \cdot shared(d)}{\sqrt{\theta}} \right) \right] - \frac{d}{n} \right\}^2 </math> Where: * <math>d</math>: is the distance to the target; * <math>n</math>: is the number of dimensions; * <math>N</math>: is the normalized normal distribution with mean zero and variance one; * <math>w</math>: is the number of times the target bitstring was written in memory; * <math>\theta</math>: is the total of random bitstrings in all <math>h</math> hard-locations activated by a read operation; i.e., the size of a cell assembly; * <math>shared(d)</math>: is the mean number of shared hard-locations activated by two bitstrings <math>d</math> bits away from each other. One can find some values for a 1000-dimensional SDM in Kanerva's book, Table 7.1, p. 63, or the equations to calculate to any SDM in Appendix B, p. 125 of the same book. ==Probabilistic interpretation== An [[associative memory (psychology)\|associative memory]] system using [[Hierarchical temporal memory#Sparse distributed representations\|sparse, distributed representations]] can be reinterpreted as an [[Importance sampling\|importance sampler]], a [[Monte Carlo method\|Monte Carlo]] method of approximating [[Bayesian inference]].<ref>Abbott, Joshua T., Jessica B. Hamrick, and Thomas L. Griffiths. "[https://web.archive.org/web/20170911115555/https://pdfs.semanticscholar.org/7f50/8bb0bf0010884a4be72f2774635514fc58ec.pdf Approximating Bayesian inference with a sparse distributed memory system]." Proceedings of the 35th annual conference of the cognitive science society. 2013.</ref> The SDM can be considered a Monte Carlo approximation to a multidimensional [[conditional probability]] integral. The SDM will produce acceptable responses from a training set when this approximation is valid, that is, when the training set contains sufficient data to provide good estimates of the underlying [[Joint probability distribution\|joint probabilities]] and there are enough Monte Carlo samples to obtain an accurate estimate of the integral.<ref>{{cite book\|doi=10.1109/ijcnn.1989.118597\|chapter=A conditional probability interpretation of Kanerva's sparse distributed memory\|title=International Joint Conference on Neural Networks\|pages=415–417\|volume=1\|year=1989\|last1=Anderson\|s2cid=13935339}}</ref> ==Biological plausibility== [[Sparse coding]] may be a general strategy of neural systems to augment memory capacity. To adapt to their environments, animals must learn which stimuli are associated with rewards or punishments and distinguish these reinforced stimuli from similar but irrelevant ones. Such task requires implementing stimulus-specific [[associative memory (psychology)\|associative memories]] in which only a few neurons out of a [[Neural ensemble\|population]] respond to any given stimulus and each neuron responds to only a few stimuli out of all possible stimuli. Theoretical work on SDM by Kanerva has suggested that sparse coding increases the capacity of associative memory by reducing overlap between representations. Experimentally, sparse representations of sensory information have been observed in many systems, including vision,<ref>{{cite journal \| last1 = Vinje \| first1 = WE \| last2 = Gallant \| first2 = JL \| year = 2000 \| title = Sparse coding and decorrelation in primary visual cortex during natural vision \| url = https://pdfs.semanticscholar.org/3efc/4ac8f70edde57661b908105f4fd21a43fbab.pdf \| archive-url = https://web.archive.org/web/20170911115737/https://pdfs.semanticscholar.org/3efc/4ac8f70edde57661b908105f4fd21a43fbab.pdf \| url-status = dead \| archive-date = 2017-09-11 \| journal = Science \| volume = 287 \| issue = 5456\| pages = 1273–1276 \| pmid = 10678835 \| doi = 10.1126/science.287.5456.1273 \| citeseerx = 10.1.1.456.2467 \| bibcode = 2000Sci...287.1273V \| s2cid = 13307465 }}</ref> audition,<ref>{{cite journal \| last1 = Hromádka \| first1 = T \| last2 = Deweese \| first2 = MR \| last3 = Zador \| first3 = AM \| year = 2008 \| title = Sparse representation of sounds in the unanesthetized auditory cortex \| journal = PLOS Biol \| volume = 6 \| issue = 1\| page = e16 \| pmid = 18232737 \| doi=10.1371/journal.pbio.0060016 \| pmc=2214813 \| doi-access = free }}</ref> touch,<ref>{{cite journal \| last1 = Crochet \| first1 = S \| last2 = Poulet \| first2 = JFA \| last3 = Kremer \| first3 = Y \| last4 = Petersen \| first4 = CCH \| year = 2011 \| title = Synaptic mechanisms underlying sparse coding of active touch \| journal = Neuron \| volume = 69 \| issue = 6\| pages = 1160–1175 \| pmid = 21435560 \| doi=10.1016/j.neuron.2011.02.022\| s2cid = 18528092 \| doi-access = free }}</ref> and olfaction.<ref>{{cite journal \| last1 = Ito \| first1 = I \| last2 = Ong \| first2 = RCY \| last3 = Raman \| first3 = B \| last4 = Stopfer \| first4 = M \| year = 2008 \| title = Sparse odor representation and olfactory learning \| journal = Nat Neurosci \| volume = 11 \| issue = 10\| pages = 1177–1184 \| pmid = 18794840 \| pmc=3124899 \| doi=10.1038/nn.2192}}</ref> However, despite the accumulating evidence for widespread sparse coding and theoretical arguments for its importance, a demonstration that sparse coding improves the stimulus-specificity of associative memory has been lacking until recently. Some progress has been made in 2014 by [[Gero Miesenböck]]'s lab at the [[University of Oxford]] analyzing [[Drosophila]] [[Olfactory system]].<ref>A sparse memory is a precise memory. Oxford Science blog. 28 Feb 2014. http://www.ox.ac.uk/news/science-blog/sparse-memory-precise-memory</ref> In Drosophila, sparse odor coding by the [[Kenyon cell]]s of the [[Mushroom bodies\|mushroom body]] is thought to generate a large number of precisely addressable locations for the storage of odor-specific memories. Lin et al.<ref>{{cite journal \| last1 = Lin \| first1 = Andrew C. \|display-authors=etal \| year = 2014 \| title = Sparse, decorrelated odor coding in the mushroom body enhances learned odor discrimination \| journal = Nature Neuroscience \| volume = 17 \| issue = 4\| pages = 559–568 \| pmc=4000970 \| pmid=24561998 \| doi=10.1038/nn.3660}}</ref> demonstrated that sparseness is controlled by a negative feedback circuit between Kenyon cells and the [[GABAergic]] anterior paired lateral (APL) neuron. Systematic activation and blockade of each leg of this feedback circuit show that Kenyon cells activate APL and APL inhibits Kenyon cells. Disrupting the Kenyon cell-APL feedback loop decreases the sparseness of Kenyon cell odor responses, increases inter-odor correlations, and prevents flies from learning to discriminate similar, but not dissimilar, odors. These results suggest that feedback inhibition suppresses Kenyon cell activity to maintain sparse, decorrelated odor coding and thus the odor-specificity of memories. A 2017 publication in [[Science_(journal)\|Science]]<ref>{{cite journal\|doi=10.1126/science.aam9868\|pmid=29123069\|title=A neural algorithm for a fundamental computing problem\|journal=Science\|volume=358\|issue=6364\|pages=793–796\|year=2017\|last1=Dasgupta\|first1=Sanjoy\|last2=Stevens\|first2=Charles F.\|last3=Navlakha\|first3=Saket\|bibcode=2017Sci...358..793D\|doi-access=free}}</ref> showed that fly olfactory circuit implements an improved version of binary [[locality sensitive hashing]] via sparse, random projections. ==Applications== In applications of the memory, the words are patterns of features. Some features are produced by a sensory system, others control a motor system. There is a ''current pattern'' (of e.g. 1000 bits), which is the current contents of the system's ''focus''. The sensors feed into the focus, the motors are driven from the focus, and the memory is accessed through the focus. What goes on in the world{{snd}} the system's "subjective" experience{{snd}} is represented internally by a sequence of patterns in the focus. The memory stores this sequence and can recreate it later in the focus if addressed with a pattern similar to one encountered in the past. Thus, the memory learns to ''predict'' what is about to happen. Wide applications of the memory would be in systems that deal with real-world information in real time. The applications include [[computer vision\|vision]]{{snd}} detecting and identifying objects in a scene and anticipating subsequent scenes{{snd}} [[robotics]], [[signal processing\|signal detection and verification]], and adaptive [[machine learning\|learning]] and [[Adaptive control\|control]]. On the theoretical side, the working of the memory may help us understand [[memory]] and [[learning]] in humans and animals.<ref name="flynn89"/><ref>Denning, Peter J. Sparse distributed memory. Research Institute for Advanced Computer Science [NASA Ames Research Center], 1989.</ref> ===The Best Match Search=== SDM can be applied to the problem of finding the ''best match'' to a test word in a dataset of stored words.<ref name="book"/><ref>Minsky, Marvin, and Papert Seymour. "Perceptrons." (1969). "Time vs. memory for best matching - an open problem" p. 222–225</ref> or, in other words, the [[Nearest neighbor search]] problem. Consider a memory with N locations where <math>N = 2^n</math>. Let each ___location have the capacity for one ''n''-bit word (e.g. N= 2<sup>100</sup> 100-bit words), and let the address decoding be done by N address decoder neurons. Set the threshold of each neuron ''x'' to its maximum weighted sum <math>\|x\|</math> and use a common parameter ''d'' to adjust all thresholds when accessing the memory. The effective threshold of neuron ''x'' will be then <math>x -\|d\|</math> which means that the ___location ''x'' is accessible every time the address ''x'' is within ''d'' bits of the address presented to memory (i.e. the address held by the address register). With <math>d=0</math> we have a conventional [[random-access memory]]. Assume further that each ___location has a special ''___location-occupied'' bit that can be accessed in the same way as the regular datum bits. Writing a word to a ___location sets this ''___location-occupied'' bit. Assume that only occupied ___location can be read. To file the data in memory, start by setting <math>d=n</math> and issue a command to clear the ''___location-occupied'' bit. This single operation marks all memory as unoccupied regardless of the values of the address register. Then set <math>d=0</math> and write each word ''y'' of the data set with ''y'' itself as the address. Notice that each write operation affects only one ___location: the ___location ''y''. Filing time is thus proportional to the number of words in the dataset. Finding the best match for a test word ''z'', involves placing ''z'' in the address register and finding the least distance ''d'' for which there is an occupied ___location. We can start the search by setting <math>d=0</math> and incrementing ''d'' successively until an occupied ___location is found. This method gives average search times that are proportional to the number of address bits or slightly less than <math>n/2</math><ref name="book"/> because the nearest occupied ___location can be expected to be just under <math>n/2</math> bits from ''z'' (with [[binary search]] on ''d'' this would be O(log(n)). With 100-bit words 2<sup>100</sup> locations would be needed, i.e. an enormously large memory. However ''if we construct the memory as we store the words of the dataset'' we need only one ___location (and one address decoder) for each word of the data set. None of the unoccupied locations need to be present. This represents the aspect of ''sparseness'' in SDM. ===Speech recognition=== SDM can be applied in [[Speech recognition\|transcribing speech]], with the training consisting of "listening" to a large corpus of spoken [[language]]. Two hard problems with natural speech are how to detect word boundaries and how to adjust to different speakers. The memory should be able to handle both. First, it stores sequences of patterns as pointer chains. In training{{snd}} in listening to speech{{snd}} it will build a probabilistic structure with the highest incidence of branching at word boundaries. In transcribing speech, these branching points are detected and tend to break the stream into segments that correspond to words. Second, the memory's sensitivity to similarity is its mechanism for adjusting to different speakers{{snd}} and to the variations in the voice of the same speaker.<ref name=flynn89/> ==="Realizing forgetting"=== {\| border="1" cellpadding="5" cellspacing="0" align="right" \|+ '''Decay Functions''' \|- \| colspan="3" align="center" \| {\| border="0" \|- \| [[Image:Exponential decay mechanism.svg\|thumb\|320px\|center\|The exponential decay function]] \|- \| [[Image:Negated sigmoid function.png\|thumb\|320px\|center\|The negated-translated sigmoid function]] \|} \|} At the University of Memphis, Uma Ramamurthy, Sidney K. D'Mello, and Stan Franklin created a modified version of the sparse distributed memory system that represents "realizing forgetting." It uses a decay equation to better show interference in data. The sparse distributed memory system distributes each pattern into approximately one hundredth of the locations,{{Clarify\|date=March 2013}} so interference can have detrimental results.<ref name=memphis>{{cite web\|title=Realizing Forgetting in a Modified Sparse Distributed Memory System\|url=http://csjarchive.cogsci.rpi.edu/proceedings/2006/docs/p1992.pdf\|work=Computer Science Department and The Institute for Intelligent Systems\|publisher=The University of Memphis\|access-date=1 November 2011\|author1=Uma Ramamurthy\|author2=Sidney K. D'Mello\|author3=Stan Franklin\|archive-url=https://web.archive.org/web/20120405152716/http://csjarchive.cogsci.rpi.edu/proceedings/2006/docs/p1992.pdf\|archive-date=5 April 2012\|pages=1992–1997\|url-status=bot: unknown}}</ref> Two possible examples of decay from this modified sparse distributed memory are presented Exponential decay mechanism: <math>\!f(x)=1+e^{-ax}</math> Negated-translated sigmoid decay mechanism: <math>f(x)=1-[\frac{1}{1+e^{-a(x-c)}}]</math> In the exponential decay function, it approaches zero more quickly as ''x'' increases, and ''a'' is a constant (usually between 3-9) and ''c'' is a counter. For the negated-[[Translation (geometry)\|translated]] [[sigmoid function]], the decay is similar to the exponential decay function when ''a'' is greater than 4.<ref name=memphis /> As the graph approaches 0, it represents how the memory is being forgotten using decay mechanisms. ===Genetic sparse distributed memory=== Ashraf Anwar, Stan Franklin, and Dipankar Dasgupta at The University of Memphis; proposed a model for SDM initialization using Genetic Algorithms and Genetic Programming (1999). [[Genetic memory (computer science)\|Genetic memory]] uses genetic algorithm and sparse distributed memory as a pseudo artificial neural network. It has been considered for use in creating artificial life.<ref name="Rocha">{{cite journal \|vauthors=Rocha LM, Hordijk W \|title=Material representations: From the genetic code to the evolution of cellular automata \|journal=Artificial Life \|volume=11 \|issue=1–2 \|pages=189–214 \|year=2005 \|pmid=15811227 \|doi=10.1162/1064546053278964 \|url=http://informatics.indiana.edu/rocha/caalife04.html \|citeseerx=10.1.1.115.6605 \|s2cid=5742197 \|access-date=2013-08-02 \|archive-date=2013-09-20 \|archive-url=https://web.archive.org/web/20130920205232/http://informatics.indiana.edu/rocha/caalife04.html \|url-status=dead }}</ref> ===Statistical prediction=== SDM has been applied to statistical [[prediction]], the task of associating extremely large perceptual state vectors with future events. In conditions of near- or over- capacity, where the associative memory behavior of the model breaks down, the processing performed by the model can be interpreted as that of a statistical predictor and each data counter in an SDM can be viewed as an independent estimate of the conditional probability of a binary function f being equal to the activation set defined by the counter's memory ___location.<ref>Rogers, David. [https://proceedings.neurips.cc/paper/1988/file/9b8619251a19057cff70779273e95aa6-Paper.pdf "Statistical prediction with Kanerva's sparse distributed memory."] Advances in neural information processing systems. 1989.</ref> ===Artificial general intelligence=== {{see also\|Cognitive architecture\|Artificial general intelligence}} [[LIDA (cognitive architecture)\|LIDA]] uses sparse distributed memory to help model [[cognition]] in biological systems. The sparse distributed memory places space is recalling or recognizing the object that it has in relation to other objects. It was developed by Stan Franklin, the creator of the "realizing forgetting" modified sparse distributed memory system.<ref name=psdm>{{cite journal \| last1 = Rao \| first1 = R. P. N. \| last2 = Fuentes \| first2 = O. \| year = 1998 \| title = Hierarchical Learning of Navigational Behaviors in an Autonomous Robot using a Predictive Sparse Distributed Memory \| journal = Machine Learning \| volume = 31 \| pages = 87–113 \| doi = 10.1023/a:1007492624519 \| s2cid = 8305178 \| doi-access = free }}</ref> Transient episodic and declarative memories have distributed representations in LIDA (based on modified version of SDM<ref>Franklin, Stan, et al. "[http://www.brains-minds-media.org/archive/150/bmm-franklin-050704.pdf/?searchterm=franklin The role of consciousness in memory]." Brains, Minds and Media 1.1 (2005): 38.</ref>), there is evidence that this is also the case in the nervous system.<ref>{{cite journal\|doi=10.1016/s1364-6613(02)01868-5\|pmid=11912039\|title=Episodic memory and cortico–hippocampal interactions\|journal=Trends in Cognitive Sciences\|volume=6\|issue=4\|pages=162–168\|year=2002\|last1=Shastri\|first1=Lokendra\|s2cid=15022802\|url=https://www1.icsi.berkeley.edu/~shastri/psfiles/ShastriTicsEM02.pdf}}</ref> [[CMatie]] is a [[Artificial consciousness\|'conscious']] software agent developed to manage seminar announcements in the Mathematical Sciences Department at the [[University of Memphis]]. It is based on SDM augmented with the use of [[genetic algorithm]]s as an [[Content-addressable memory\|associative memory]].<ref>{{cite journal \| last1 = Anwar \| first1 = Ashraf \| last2 = Franklin \| first2 = Stan \| year = 2003 \| title = Sparse distributed memory for 'conscious' software agents \| journal = Cognitive Systems Research \| volume = 4 \| issue = 4\| pages = 339–354 \| doi=10.1016/s1389-0417(03)00015-9\| s2cid = 13380583 }}</ref> [[Hierarchical temporal memory]] utilizes SDM for storing sparse distributed representations of the data. ===Reinforcement learning=== SDMs provide a linear, local [[function approximation]] scheme, designed to work when a very large/high-dimensional input (address) space has to be mapped into a much smaller [[Computer memory\|physical memory]]. In general, local architectures, SDMs included, can be subject to the [[curse of dimensionality]], as some target functions may require, in the worst case, an exponential number of local units to be approximated accurately across the entire input space. However, it is widely believed that most [[Decision support system\|decision-making systems]] need high accuracy only around low-dimensional [[manifolds]] of the [[state space]], or important state "highways".<ref>Ratitch, Bohdana, Swaminathan Mahadevan, and [[Doina Precup]]. "Sparse distributed memories in reinforcement learning: Case studies." Proc. of the Workshop on Learning and Planning in Markov Processes-Advances and Challenges. 2004.</ref> The work in Ratitch et al.<ref>Ratitch, Bohdana, and Doina Precup. "[http://www.cs.mcgill.ca/~dprecup/temp/ecml2004.pdf Sparse distributed memories for on-line value-based reinforcement learning] {{Webarchive\|url=https://web.archive.org/web/20150824061329/http://www.cs.mcgill.ca/~dprecup/temp/ecml2004.pdf \|date=2015-08-24 }}." Machine Learning: ECML 2004. Springer Berlin Heidelberg, 2004. 347-358.</ref> combined the SDM memory model with the ideas from [[instance-based learning\|memory-based learning]], which provides an approximator that can dynamically adapt its structure and resolution in order to locate regions of the state space that are "more interesting"<ref>Bouchard-Côté, Alexandre. "[https://www.stat.ubc.ca/~bouchard/pub/report-ml.pdf Sparse Memory Structures Detection]." (2004).</ref> and allocate proportionally more memory resources to model them accurately. ===Object indexing in computer vision=== [[Dana H. Ballard]]'s lab<ref>Rao, Rajesh PN, and Dana H. Ballard. "[https://web.archive.org/web/20170911115922/https://pdfs.semanticscholar.org/b918/b2326656a3661689e6bf3b6de9a8245d87ac.pdf Object indexing using an iconic sparse distributed memory]." Computer Vision, 1995. Proceedings., Fifth International Conference on. IEEE, 1995.</ref> demonstrated a general-purpose object indexing technique for [[computer vision]] that combines the virtues of [[principal component analysis]] with the favorable matching properties of high-dimensional spaces to achieve high precision recognition. The indexing algorithm uses an [[active vision]] system in conjunction with a modified form of SDM and provides a platform for learning the association between an object's appearance and its identity. ==Extensions== Many extensions and improvements to SDM have been proposed, e.g.: Ternary memory space: This enables the memory to be used as a Transient Episodic Memory (TEM) in [[Cognitive architecture\|cognitive software agents]]. TEM is a memory with high specificity and low retention, used for events having features of a particular time and place.<ref>D'Mello, Sidney K., Ramamurthy, U., & Franklin, S. 2005. [http://escholarship.org/uc/item/2b78w526.pdf Encoding and Retrieval Efficiency of Episodic Data in a Modified Sparse Distributed Memory System]. In Proceedings of the 27th Annual Meeting of the Cognitive Science Society. Stresa, Ital</ref><ref>Ramamaurthy, U., Sidney K. D'Mello, and Stan Franklin. "[https://www.academia.edu/download/43397052/modifed_20sparse_20Distributed_20Memory_20as_20TSM_20for_20CSA.pdf Modified sparse distributed memory as transient episodic memory for cognitive software agents]{{dead link\|date=July 2022\|bot=medic}}{{cbignore\|bot=medic}}." Systems, Man and Cybernetics, 2004 IEEE International Conference on. Vol. 6. IEEE, 2004.</ref> * Integer SDM that uses modular arithmetic integer vectors rather than binary vectors. This extension improves the representation capabilities of the memory and is more robust over normalization. It can also be extended to support forgetting and reliable sequence storage.<ref name="integerSDM">Snaider, Javier, and Stan Franklin. "[http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS12/paper/viewFile/4409/4781 Integer sparse distributed memory] {{Webarchive\|url=https://web.archive.org/web/20210802030951/https://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS12/paper/viewFile/4409/4781 \|date=2021-08-02 }}." Twenty-fifth international flairs conference. 2012.</ref> * Using word vectors of larger size than address vectors: This extension preserves many of the desirable properties of the original SDM: auto-associability, content addressability, distributed storage and robustness over noisy inputs. In addition, it adds new functionality, enabling an efficient auto-associative storage of sequences of vectors, as well as of other data structures such as trees.<ref>{{cite journal \| last1 = Snaider \| first1 = Javier \| last2 = Franklin \| first2 = Stan \| year = 2012 \| title = Extended sparse distributed memory and sequence storage \| journal = Cognitive Computation \| volume = 4 \| issue = 2\| pages = 172–180 \| doi=10.1007/s12559-012-9125-8\| s2cid = 14319722 }}</ref> * Constructing SDM from [[Biological neuron model\|Spiking Neurons]]: Despite the biological likeness of SDM most of the work undertaken to demonstrate its capabilities to date has used highly artificial neuron models which abstract away the actual behaviour of [[neurons]] in the [[brain]]. Recent work by [[Steve Furber]]'s lab at the [[University of Manchester]]<ref>{{cite journal \| last1 = Furber \| first1 = Steve B. \|display-authors=etal \| year = 2004 \| title = Sparse distributed memory using N-of-M codes \| journal = Neural Networks \| volume = 17 \| issue = 10\| pages = 1437–1451 \| doi=10.1016/j.neunet.2004.07.003\| pmid = 15541946 }}</ref><ref>Sharp, Thomas: "[https://studentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/MSc09/FullText/SharpThomas.pdf Application of sparse distributed memory to the Inverted Pendulum Problem]". Diss. University of Manchester, 2009. URL: http://studentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/MSc09/FullText/SharpThomas.pdf</ref><ref>Bose, Joy. [https://www.academia.edu/download/7385022/bose07_phd.pdf Engineering a Sequence Machine Through Spiking Neurons Employing Rank-order Codes]{{dead link\|date=July 2022\|bot=medic}}{{cbignore\|bot=medic}}. Diss. University of Manchester, 2007.</ref> proposed adaptations to SDM, e.g. by incorporating N-of-M rank codes<ref>Simon Thorpe and Jacques Gautrais. [https://www.researchgate.net/profile/Jacques-Gautrais/publication/285068799_Rank_order_coding_Computational_neuroscience_trends_in_research/links/587ca2e108ae4445c069772a/Rank-order-coding-Computational-neuroscience-trends-in-research.pdf Rank order coding.] In Computational Neuroscience: Trends in research, pages 113–118. Plenum Press, 1998.</ref><ref>{{cite journal \| last1 = Furber \| first1 = Stephen B. \|display-authors=etal \| year = 2007 \| title = Sparse distributed memory using rank-order neural codes \| journal = IEEE Transactions on Neural Networks\| volume = 18 \| issue = 3\| pages = 648–659 \| doi=10.1109/tnn.2006.890804\| pmid = 17526333 \| citeseerx = 10.1.1.686.6196 \| s2cid = 14256161 }}</ref> into how [[Neural coding#Population coding\|populations of neurons]] may encode information—which may make it possible to build an SDM variant from biologically plausible components. This work has been incorporated into [[SpiNNaker\|SpiNNaker (Spiking Neural Network Architecture)]] which is being used as the [[Neuromorphic engineering\|Neuromorphic Computing]] Platform for the [[Human Brain Project]].<ref>{{cite journal \| last1 = Calimera \| first1 = A \| last2 = Macii \| first2 = E \| last3 = Poncino \| first3 = M \| year = 2013 \| title = The Human Brain Project and neuromorphic computing \| journal = Functional Neurology \| volume = 28 \| issue = 3\| pages = 191–6 \| pmid = 24139655 \| pmc=3812737}}</ref> * Non-random distribution of locations:<ref>{{cite journal \| last1 = Hely \| first1 = Tim \| last2 = Willshaw \| first2 = David J. \| last3 = Hayes \| first3 = Gillian M. \| year = 1997 \| title = A new approach to Kanerva's sparse distributed memory \| journal = IEEE Transactions on Neural Networks\| volume = 8 \| issue = 3\| pages = 791–794 \| doi=10.1109/72.572115\| pmid = 18255679 \| s2cid = 18628649 }}</ref><ref>Caraig, Lou Marvin. "[https://arxiv.org/abs/1207.5774 A New Training Algorithm for Kanerva's Sparse Distributed Memory]." arXiv preprint arXiv:1207.5774 (2012).</ref> Although the storage locations are initially distributed randomly in the binary N address space, the final distribution of locations depends upon the input patterns presented, and may be non-random thus allowing better flexibility and [[Generalization error\|generalization]]. The data pattern is first stored at locations which lie closest to the input address. The signal (i.e. data pattern) then spreads throughout the memory, and a small percentage of the signal strength (e.g. 5%) is lost at each subsequent ___location encountered. Distributing the signal in this way removes the need for a select read/write radius, one of the problematic features of the original SDM. All locations selected in a write operation do not now receive a copy of the original binary pattern with equal strength. Instead they receive a copy of the pattern weighted with a real value from 1.0->0.05 to store in real valued counters (rather than binary counters in Kanerva's SDM). This rewards the nearest locations with a greater signal strength, and uses the natural architecture of the SDM to attenuate the signal strength. Similarly in reading from the memory, output from the nearest locations is given a greater weight than from more distant locations. The new signal method allows the total signal strength received by a ___location to be used as a measure of the fitness of a ___location and is flexible to varying input (as the loss factor does not have to be changed for input patterns of different lengths). * SDMSCue (Sparse Distributed Memory for Small Cues): Ashraf Anwar & Stan Franklin at The University of Memphis, introduced a variant of SDM capable of Handling Small Cues; namely SDMSCue in 2002. The key idea is to use multiple Reads/Writes, and space projections to reach a successively longer cue.<ref>{{Cite book\|title = A Sparse Distributed Memory Capable of Handling Small Cues, SDMSCue\|publisher = Springer US\|date = 2005-01-01\|isbn = 978-0-387-24048-0\|pages = 23–38\|series = IFIP — The International Federation for Information Processing\|language = en\|first1 = Ashraf\|last1 = Anwar\|first2 = Stan\|last2 = Franklin\|editor-first = Michael K.\|editor-last = Ng\|editor-first2 = Andrei\|editor-last2 = Doncescu\|editor-first3 = Laurence T.\|editor-last3 = Yang\|editor-first4 = Tau\|editor-last4 = Leng\|doi = 10.1007/0-387-24049-7_2\| s2cid=10290721 }}</ref> ==Related patents== * Method and apparatus for a sparse distributed memory system US 5113507 A, [[Universities Space Research Association]], 1992<ref>Method and apparatus for a sparse distributed memory system US 5113507 A, by Louis A. Jaeckel, Universities Space Research Association, 1992, URL: https://patents.google.com/patent/US5113507</ref> * Method and device for storing and recalling information implementing a kanerva memory system US 5829009 A, [[Texas Instruments]], 1998<ref>Method and device for storing and recalling information implementing a kanerva memory system US 5829009 A, by Gary A. Frazier, Texas Instruments Incorporated, 1998, URL: https://patents.google.com/patent/US5829009</ref> * Digital memory, Furber, Stephen. US 7512572 B2, 2009<ref>Furber, Stephen B. "Digital memory." U.S. Patent No. 7,512,572. 31 Mar. 2009.___URL: https://patents.google.com/patent/US7512572</ref> ==Implementation== {{external links\|section\|date=February 2023}} * C Binary Vector Symbols (CBVS): includes SDM implementation in [[C (programming language)\|C]] as a part of [[vector symbolic architecture]]<ref>{{cite journal\|doi=10.1016/j.bica.2014.11.015\|title=Vector space architecture for emergent interoperability of systems by learning from demonstration\|journal=Biologically Inspired Cognitive Architectures\|volume=11\|pages=53–64\|year=2015\|last1=Emruli\|first1=Blerim\|last2=Sandin\|first2=Fredrik\|last3=Delsing\|first3=Jerker\|url=http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-4068 }}</ref> developed by EISLAB at [[Luleå University of Technology]]: http://pendicular.net/cbvs.php {{Webarchive\|url=https://web.archive.org/web/20150925123906/http://pendicular.net/cbvs.php \|date=2015-09-25 }}<ref>{{cite journal \| last1 = Emruli \| first1 = Blerim \| last2 = Sandin \| first2 = Fredrik \| year = 2014 \| title = Analogical mapping with sparse distributed memory: A simple model that learns to generalize from examples \| url = http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-14994\| journal = Cognitive Computation \| volume = 6 \| issue = 1\| pages = 74–88 \| doi=10.1007/s12559-013-9206-3\| s2cid = 12139021 }}</ref> * CommonSense ToolKit (CSTK) for realtime sensor data processing developed at the [[Lancaster University]] includes implementation of SDM in [[C++]]: https://cstk.sourceforge.net/<ref>Berchtold, Martin. "Processing Sensor Data with the Common Sense Toolkit (CSTK)." (2005).</ref> [[Julia (programming language)\|Julia]] implementation by [[Brian Hayes (scientist)\|Brian Hayes]]: https://github.com/bit-player/sdm-julia <ref>The Mind Wanders by B. Hayes, 2018. url: http://bit-player.org/2018/the-mind-wanders</ref> * [[LIDA (cognitive architecture)\|Learning Intelligent Distribution Agent (LIDA)]] developed by [[Stan Franklin]]'s lab at the [[University of Memphis]] includes implementation of SDM in [[Java (programming language)\|Java]]: http://ccrg.cs.memphis.edu/framework.html * [[Python (programming language)\|Python]] implementation: https://github.com/msbrogli/sdm<ref name="PMC4009432">{{cite journal \| last1 = Brogliato \| first1 = Marcelo S. \| last2 = Chada \| first2 = Daniel M. \| last3 = Linhares \| first3 = Alexandre \| year = 2014\| title = Sparse distributed memory: understanding the speed and robustness of expert memory \| journal = Frontiers in Human Neuroscience \| volume = 8\| page = 222\| pmc=4009432 \| pmid=24808842 \| doi=10.3389/fnhum.2014.00222\| doi-access = free }}</ref> * [[Python (programming language)\|Python]] and [[OpenCL]] implementation: https://github.com/msbrogli/sdm-framework<ref name="PMC4009432"/> * [[APL (programming language)\|APL]] implementation<ref>{{cite journal\|doi=10.1145/144052.144142\|title=WSDM: Weighted sparse distributed memory prototype expressed in APL\|journal=ACM SIGAPL APL Quote Quad\|volume=23\|pages=235–242\|year=1992\|last1=Surkan\|first1=Alvin J.}}</ref> * [[LISP]] implementation for the [[Connection Machine]]<ref>Turk, Andreas, and Günther Görz. "Kanerva's sparse distributed memory: an object-oriented implementation on the connection machine." IJCAI. 1995.</ref> * [[FPGA]] implementation<ref>{{cite journal \| last1 = Silva \| last2 = Tadeu Pinheiro \| first2 = Marcus \| last3 = Pádua Braga \| first3 = Antônio \| last4 = Soares Lacerda \| first4 = Wilian \| year = 2004 \| title = Reconfigurable co-processor for kanerva's sparse distributed memory \| url = https://www.researchgate.net/publication/220055514 \| format = PDF \| journal = Microprocessors and Microsystems \| volume = 28 \| issue = 3\| pages = 127–134 \| doi = 10.1016/j.micpro.2004.01.003 }}</ref> * The original hardware implementation developed by [[NASA]]<ref name="flynn89"/> An Implementation in C done at the Research Institute for Advanced Computer Science at NASA Ames<ref>{{Cite web\|url=https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19870017134.pdf\|title=Two Demonstrators and a Simulator for a Sparse Distributed Memory\|last=Brown\|first=Robert L.\|date=June 1987\|website=NASA Technical Reports Archive}}</ref> ==See also== [[Autoassociative memory]] * [[Cerebellar model articulation controller]] * [[Types of artificial neural networks#Dynamic\|Dynamic memory networks]] * [[Holographic associative memory]] * [[Low-density parity-check code]] * [[Types of artificial neural networks#Memory networks\|Memory networks]] * [[Memory-prediction framework]] * [[Neural coding]] * [[Neural Turing machine]] * [[Random indexing]] * [[Self-organizing map]] * [[Semantic folding]] * [[Semantic memory]] * [[Semantic network]] * Stacked [[autoencoder]]s * [[Visual indexing theory]] ==References== {{Reflist}} [[Category:Memory]] [[Category:Cognitive architecture]]