Neural modeling fields

This is an old revision of this page, as edited by Romanilin (talk | contribs) at 00:48, 13 August 2008. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Neural modeling field (NMF) theory mathematically implements the mind mechanisms including concepts, emotions, instincts, imagination, thinking, understanding, language, interaction between language and cognition, the knowledge instinct, conscious, unconscious, aesthetic emotions including beautiful and sublime. NMF provides a foundation for modeling evolution of languages, consciousness, and cultures.

NMF is a multi-level, hetero-hierarchical system [1]. The mind is not a strict hierarchy; there are multiple feedback connections among adjacent levels, hence the term hetero-hierarchy. At each level in NMF there are concept-models encapsulating the mind’s knowledge; they generate so-called top-down signals, interacting with input, bottom-up signals. These interactions are governed by the knowledge instinct, which drives concept-model learning, adaptation, and formation of new concept-models for better correspondence to the input, bottom-up signals.

Here we describe a basic mechanism of interaction between two adjacent hierarchical levels of bottom-up and top-down signals (fields of neural activation; in this aspect NMF follows[2]; sometimes, it will be more convenient to talk about these two signal-levels as an input to and output from a (single) processing-level. At each level, output signals are concepts recognized in (or formed from) input, bottom-up signals. Input signals are associated with (or recognized, or grouped into) concepts according to the models and the knowledge instinct at this level. This general structure of NMF corresponds to our knowledge of neural structures in the brain; still, here we do not map mathematical mechanisms in all their details to specific neurons or synaptic connections. The knowledge instinct is described mathematically as maximization of a similarity measure. In the process of learning and understanding input, bottom-up signals, concept-models are adapted for better representation of the input signals so that similarity between the concept-models and signals increases. This increase in similarity satisfies the knowledge instinct and is felt as aesthetic emotions.


The Knowledge Instinct

At a particular hierarchical level, we enumerate neurons by index n=1,2..N. These neurons receive input, bottom-up signals, X(n), from lower levels in the processing hierarchy. X(n) is a field of bottom-up neuronal synaptic activations, coming from neurons at a lower level. Each neuron has a number of synapses; for generality, we describe each neuron activation as a set of numbers, X(n) = {Xd(n), d = 1,... D}. Top-down, or priming signals to these neurons are sent by concept-models, Mm(Sm,n); we enumerate concept-models by index m=1,2..M. Each model is characterized by its parameters, Sm; in the neuron structure of the brain they are encoded by strength of synaptic connections, mathematically, we describe them as a set of numbers, Sm = {Sma, a = 1,... A}.

Models represent signals in the following way. Say, signal X(n), is coming from sensory neurons activated by object m, characterized by parameters Sm. These parameters may include position, orientation, or lighting of an object m. Model Mm(Sm,n) predicts a value X(n) of a signal at neuron n. For example, during visual perception, a neuron n in the visual cortex receives a signal X(n) from retina and a priming signal Mm(Sm,n) from an object-concept-model m. Neuron n is activated if both the bottom-up signal from lower-level-input and the top-down priming signal are strong. Various models compete for evidence in the bottom-up signals, while adapting their parameters for better match as described below. This is a simplified description of perception. The most benign everyday visual perception uses many levels from retina to object perception. The NMF premise is that the same laws describe the basic interaction dynamics at each level. Perception of minute features, or everyday objects, or cognition of complex abstract concepts is due to the same mechanism described below. Perception and cognition involve concept-models and learning. In perception, concept-models correspond to objects; in cognition models correspond to relationships and situations.


Learning is an essential part of perception and cognition, and it is driven by the knowledge instinct. It increases a similarity measure between the sets of models and signals, L({X},{M}). The similarity measure is a function of model parameters and associations between the input bottom-up signals and top-down, concept-model signals. For concreteness the following text refers to an object perception using simplified terminology, as if perception of objects in retinal signals occurs in a single level.

In constructing a mathematical description of the similarity measure, it is important to acknowledge two principles (which are almost obvious). First, the visual field content is unknown before perception occurred and second, it may contain any of a number of objects. Important information could be contained in any bottom-up signal; therefore, the similarity measure is constructed so that it accounts for all bottom-up signals, X(n),


L({X},{M}) = ∏n=1..N l(X(n)).


This expression contains a product of partial similarities, l(X(n)), over all bottom-up signals; therefore it forces the mind to account for every signal (even if one term in the product is zero, the product is zero, the similarity is low and the knowledge instinct is not satisfied); this is a reflection of the first principle. Second, before perception occurs, the mind does not know which object gave rise to a signal from a particular retinal neuron. Therefore a partial similarity measure is constructed so that it treats each model as an alternative (a sum over concept-models) for each input neuron signal. Its constituent elements are conditional partial similarities between signal X(n) and model Mm, l(X(n)|m). This measure is “conditional” on object m being present (Perlovsky 2001), therefore, when combining these quantities into the overall similarity measure, L, they are multiplied by r(m), which represent a probabilistic measure of object m actually being present. Combining these elements with the two principles noted above, a similarity measure is constructed as follows:


L({X},{M}) = ∏n=1..Nm=1..M r(m) l(X(n) | m).


The structure of the expression above follows standard principles of the probability theory: a summation is taken over alternatives, m, and various pieces of evidence, n, are multiplied. This expression is not necessarily a probability, but it has a probabilistic structure. If learning is successful, it approximates probabilistic description and leads to near-optimal Bayesian decisions. The name “conditional partial similarity” for l(X(n)|m) (or simply l(n|m)) follows the probabilistic terminology. If learning is successful, l(n|m) becomes a conditional probability density function, a probabilistic measure that signal in neuron n originated from object m. Then L is a total likelihood of observing signals {X(n)} coming from objects described by concept-model {Mm}. Coefficients r(m), called priors in probability theory, contain preliminary biases or expectations, expected objects m have relatively high r(m) values; their true values are usually unknown and should be learned, like other parameters Sm.

We note that in probability theory, a product of probabilities usually assumes that evidence is independent. Expression for L contains a product over n, but it does not assume independence among various signals X(n). There is a dependence among signals due to concept-models:each model Mm(Sm,n) predicts expected signal values in many neurons n.

During the learning process, concept-models are constantly modified. In this review we consider a case when functional forms of models, Mm(Sm,n), are all fixed and learning-adaptation involves only model parameters, Sm. From time to time a system forms a new concept, while retaining an old one as well; alternatively, old concepts are sometimes merged or eliminated. This requires a modification of the similarity measure L; ); the reason is that more models always result in a better fit between the models and data. This is a well known problem, it is addressed by reducing similarity L using a “skeptic penalty function,” p(N,M) that grows with the number of models M, and this growth is steeper for a smaller amount of data N. For example, an asymptotically unbiased maximum likelihood estimation leads to multiplicative p(N,M) = exp(-Npar/2), where Npar is a total number of adaptive parameters in all models (this penalty function is known as Akaike information criterion, see (Perlovsky 2001) for further discussion and references).


Psychologically, satisfaction of instincts is felt as pleasant emotions. Emotions related to satisfaction of the knowledge instinct (maximization of similarity measure L) are aesthetic emotions, they are “spiritual” in that they are related to working of the mind-brain (whereas bodily emotions are related to bodily instincts).

Dynamic logic

The learning process consists in estimating model parameters S and associating signals with concepts by maximizing the similarity L. Note, all possible combinations of signals and models are accounted for in expression for L. This can be seen by expanding a sum and multiplying all the terms; it would result in MN items, a huge number. This is the number of combinations between all signals (N) and all models (M). Here is the source of Combinatorial Complexity of many algorithms used in the past. For example, a popular multiple hypothesis testing algorithm[3] attempts to maximize similarity L over model parameters and associations between signals and models, in two steps. First it takes one of the MN items, which is one particular association between signals and models; and maximizes it over model parameters. Second, the largest item is selected (that is the best association for the best set of parameters). Such a program inevitably faces a wall of Combinatorial Complexity, the number of computations on the order of MN. NMF solves this problem by using dynamic logic[4], [5]. An important aspect of dynamic logic is matching vagueness or fuzziness of similarity measures to the uncertainty of models. Initially, parameter values are not known, and uncertainty of models is high; so is the fuzziness of the similarity measures. In the process of learning, models become more accurate, and the similarity measure more crisp, the value of the similarity increases. This is the mechanism of dynamic logic. Mathematics of dynamic logicis described in a separate article.

Example of Dynamic Logic Operations

Finding patterns below noise can be an exceedingly complex problem. If an exact pattern shape is not known and depends on unknown parameters, these parameters should be found by fitting the pattern model to the data. However, when the locations and orientations of patterns are not known, it is not clear which subset of the data points should be selected for fitting. A standard approach for solving this kind of problem, which has already been discussed, is multiple hypothesis testing (Singer et al 1974). Here, since all combinations of subsets and models are exhaustively searched, it faces the problem of combinatorial complexity. In the current example, we are looking for ‘smile’ and ‘frown’ patterns in noise shown in Fig.1a without noise, and in Fig.1b with noise, as actually measured. The true number of patterns is 3, which is not known. Therefore, at least 4 patterns should be fit to the data, to decide that 3 patterns fit best. The image size in this example is 100x100 = 10,000 points. If one attempts to fit 4 models to all subsets of 10,000 data points, computation of complexity, MN ~ 106000. An alternative computation by searching through the parameter space, yields lower complexity: each pattern is characterized by a 3-parameter parabolic shape. Fitting 4x3=12 parameters to 100x100 grid by a brute-force testing would take about 1032 to 1040 operations, still a prohibitive computational complexity. To apply NMF and dynamic logic to this problem one needs to develop parametric adaptive models of expected patterns. The models and conditional partial similarities for this case are described in details in[6]: a uniform model for noise, Gaussian blobs for highly-fuzzy, poorly resolved patterns, and parabolic models for ‘smiles’ and ‘frowns’. The number of computer operations in this example was about 1010. Thus, a problem that was not solvable due to combinatorial complexity becomes solvable using dynamic logic.

During an adaptation process, initial fuzzy and uncertain models are associated with structures in the input signals, and fuzzy models become more definite and crisp with successive iterations. The type, shape, and number, of models are selected so that the internal representation within the system is similar to input signals: the NMF concept-models represent structure-objects in the signals. The figure below illustrates operations of dynamic logic. In Fig. 1(a) true ‘smile’ and ‘frown’ patterns are shown without noise; (b) actual image available for recognition (signal is below noise, signal-to-noise ratio is between –2dB and –0.7dB); (c) an initial fuzzy model, a large fuzziness corresponds to uncertainty of knowledge; (d) through (m) show improved models at various iteration stages (total of 22 iterations). Every five iterations the algorithm tried to increase or decrease the number of models. Between iterations (d) and (e) the algorithm decided, that it needs three Gaussian models for the ‘best’ fit.

There are several types of models: one uniform model describing noise (it is not shown) and a variable number of blob models and parabolic models; their number, ___location, and curvature are estimated from the data. Until about stage (g) the algorithm used simple blob models, at (g) and beyond, the algorithm decided that it needs more complex parabolic models to describe the data. Iterations stopped at (h), when similarity stopped increasing.

 
Fig.1. Finding ‘smile’ and ‘frown’ patterns in noise, an example of dynamic logic operation: (a) true ‘smile’ and ‘frown’ patterns are shown without noise; (b) actual image available for recognition (signal is below noise, signal-to-noise ratio is between –2dB and –0.7dB); (c) an initial fuzzy blob-model, the fuzziness corresponds to uncertainty of knowledge; (d) through (m) show improved models at various iteration stages (total of 22 iterations). Between stages (d) and (e) the algorithm tried to fit the data with more than one model and decided, that it needs three blob-models to ‘understand’ the content of the data. There are several types of models: one uniform model describing noise (it is not shown) and a variable number of blob-models and parabolic models, which number, ___location, and curvature are estimated from the data. Until about stage (g) the algorithm ‘thought’ in terms of simple blob models, at (g) and beyond, the algorithm decided that it needs more complex parabolic models to describe the data. Iterations stopped at (m), when similarity L stopped increasing. This example is discussed in more details in (Linnehan et al 2003).

References

  1. ^ [1]: Perlovsky, L.I. 2001. Neural Networks and Intellect: using model based concepts. New York: Oxford University Press
  2. ^ Perlovsky, L.I. (2006). Toward Physics of the Mind: Concepts, Emotions, Consciousness, and Symbols. Phys. Life Rev. 3(1), pp.22-55.
  3. ^ Singer, R.A., Sea, R.G. and Housewright, R.B. (1974). Derivation and Evaluation of Improved Tracking Filters for Use in Dense Multitarget Environments, IEEE Transactions on Information Theory, IT-20, pp. 423-432.
  4. ^ Perlovsky, L.I. (1996). Mathematical Concepts of Intellect. Proc. World Congress on Neural Networks, San Diego, CA; Lawrence Erlbaum Associates, NJ, pp.1013-16
  5. ^ Perlovsky, L.I.(1997). Physical Concepts of Intellect. Proc. Russian Academy of Sciences, 354(3), pp. 320-323.
  6. ^ Linnehan, R., Mutz, Perlovsky, L.I., C., Weijers, B., Schindler, J., Brockett, R. (2003). Detection of Patterns Below Clutter in Images. Int. Conf. On Integration of Knowledge Intensive Multi-Agent Systems, Cambridge, MA Oct.1-3, 2003.

Leonid Perlovsky