Visual indexing theory: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: url-access updated in citation with #oabot.
 
(8 intermediate revisions by 4 users not shown)
Line 1:
{{Short description|Visual perception theory}}
Visual Indexing Theory (also called [[Visual Indexing Theory|FINST theory]]) is an account of early [[visual perception]] developed by [[Zenon Pylyshyn]] in the 1980s. It proposes a
'''Visual indexing theory''', also known as '''FINST theory''', is a theory of early [[visual perception]] developed by [[Zenon Pylyshyn]] in the 1980s. It proposes a [[Pre-attentive processing|pre-attentive]] mechanism (a ‘FINST’) whose function is to individuate salient elements of a visual scene, and track their locations across space and time. Developed in response to what Pylyshyn viewed as limitations of prominent theories of visual perception at the time, visual indexing theory is supported by several lines of empirical evidence.
 
== Overview ==
Line 8:
=== Fingers of instantiation ===
 
'FINST' abbreviates ‘FINgers of INSTantiation’. Pylyshyn describes visual indexing theory in terms of this analogy.<ref name="Pylyshyn 1989">Pylyshyn, Z.W. (1989). The role of ___location indexes in spatial perception: a sketch of the FINST spatial index model. Cognition, 32, 65–97.</ref> Imagine, he proposes, placing your fingers on five separate objects in a scene. As those objects move about, your fingers stay in respective contact with each of them, allowing you to continually track their whereabouts and positions relative to one another. While you may not be able to discern in this way any detailed information about the items themselves, the presence of your fingers provides a reference via which you can access such information at any time, without having to relocate the objects within the scene. Furthermore, the objects' continuity over time is inherently maintained — you know the object referenced by youyour pinky finger at time ''t'' is the same object as that referenced by your pinky at ''t<sub>-1−1</sub>'', regardless of any spatial transformations it has undergone, because your finger has remained in continuous contact with it.
 
Visual indexing theory holds that the visual perceptual system works in an analogous way. FINSTs behave like the fingers in the above scenario, pointing to and tracking the ___location of various objects in visual space. Like fingers, FINSTs are:
Line 15:
* '''Opaque''' to the features of the objects they index. FINSTs reference objects according to their ___location only. No additional information about their referents is conveyed via the FINST mechanism itself.
 
=== FINSTs' roleRole in the visual perception process ===
 
====Individuation====
 
FINSTs operate pre-attentively — that is, before attention is drawn or directed to an object in the visual field. Their primary task is to ''individuate'' certain salient features in a scene, conceptually distinguishing these from other stimuli. Under visual indexing theory, FINSTing is a necessary precondition for higher level perceptual processing.
 
Pylyshyn suggests that what FINSTs operate upon in a direct sense is 'feature clusters' on the retina, though a precise set of criteria for FINST allocation has not been defined. "The question of how FINSTs are assigned in the first instance remains open, although it seems reasonable that they are assigned primarily in a stimulus-driven manner, perhaps by the activation of locally distinct properties of the stimulus-particularly by new features entering the visual field."<ref name="Pylyshyn 1989"/>
 
FINSTs are subject to resource constraints. Up to around five FINSTs can be allocated at any given time, and these provide the visual system information about the relative locations of FINSTed objects with respect to one another.
Line 31:
====Attentional facilitation====
 
Under visual indexing theory, an object cannot be attended to until it has first been indexed. Once it has been allocated a FINST, the index provides the visual system with rapid and preferential access to the object for further processing of features such as colour, texture and shape.
 
While in this sense FINSTs provide the means for higher-level processing to occur, FINSTs themselves are "opaque to the properties of the objects to which they refer."<ref name="Pylyshyn 1989"/> FINSTs do not directly convey any information about an indexed object, beyond its position at a given instant. "Thus, on initial contact, objects are not interpreted as belonging to a certain type or having certain properties; in other words, objects are initially detected without being conceptualised."<ref name="Pylyshyn 2000">Pylyshyn, Z.W. (2000). Situating vision in the world. Trends in Cognitive Sciences 4,(5), 197-207.</ref> Like the fingers described above, FINSTs' role in visual perception is purely an indexical one.
Line 49:
According to the classical view of [[mental representation]], we perceive objects according to the conceptual descriptions they fall under. It is these descriptions, and not the raw content of our visual perceptions, that allow us to construct meaningful representations of the world around us, and determine appropriate courses of action. In Pylyshyn's words, "it is not the bright spot in the sky that determines which way we set out when we are lost, but the fact that we see it (or represent it) as the North Star".<ref name="Pylyshyn 2001">Pylyshyn, Z. W. (2001). Visual indexes, preconceptual objects, and situated vision. Cognition 80, 127-158.</ref> The method by which we come to match a percept to its appropriate description has been the subject of ongoing investigation (for example the way in which parts of objects are combined to represent their whole),<ref>Hoffman, D. D. and Richards, W. A. (1984). Parts of recognition. Cognition 18, Issues 1–3, 65–96.</ref> but there is a general consensus that descriptions are fundamental in this way to visual perception.<ref name="Pylyshyn 2001"/>
 
Like the spotlight model of attention, Pylyshyn takes the descriptive model of visual representation to be incomplete. One issue is that the theory does not account for demonstrative, or indexical references. "For example, in the presence of a visual stimulus, we can think thoughts such as `that is red' where the term `that' refers to something we have picked out in our field of view without reference to what category it falls under or what properties it may have."<ref name="Pylyshyn 2001"/> Relatedly, the theory has problems accounting for how we are able to pick out a single token among several objects of the same type. For example, I may refer to a particular can of soup on a supermarket shelf sitting among a number of identical cans that answer to the same description. In both cases, a spatiotemporal reference is required in order to pick out the object within the scene, independently of any description that object may fall under. FINSTs, Pylyshyn suggests, provide just such a reference.
 
A deeper problem for this view, according to Pylyshyn, is that it cannot account for objects' continuity over time. "An individual remains the same individual when it moves about or when it changes any (or even all) of its visible properties."<ref name="Pylyshyn 2001"/> If we refer to objects solely in terms of their conceptual descriptions, it is not clear how the visual system maintains an object's identity when those descriptions change. "The visual system needs to be able to pick out a particular individual regardless of what properties the individual happens to have at any instant of time."<ref name="Pylyshyn 2001"/> Pylyshyn argues that FINSTs' detachment from the descriptions of the objects they reference overcomes this problem.
Line 74:
*become better at tracking multiple objects with relevant practice/expertise.<ref>Allen, R., McGeorge, P., Pearson, D. G., & Milne, A. B. (2004). Attention and expertise in multiple target tracking. Applied Cognitive Psychology, 18, 337-347.</ref><ref>Green, C. S. Bavelier, D. (2006). Enumeration versus multiple object tracking: The case of action video game players. Cognition, 101, 217–245.</ref>
 
Two defining properties of FINSTs are their plurality, and their capacity to track indexed objects as they move around a visually cluttered scene. "Thus multiple-item tracking studies provide strong support for one of the more counterintuitive predictions of FINST theory — namely, that the identity of items can be maintained by the visual system even when the items are visually indiscriminable from their neighbors and when their locations are constantly changing."<ref name="Pylyshyn94"/>
 
=== Subitizing studies ===
{{See also|Numerical cognition}}
[[Subitizing]] refers to the rapid and accurate enumeration of small numbers of items. Numerous studies (dating back to [[William Stanley Jevons|Jevons]] in 1871)<ref>Jevons, W. (1871). The power of numerical discrimination. Nature, 3, 281–282.</ref> have demonstrated that subjects can very quickly and accurately report the quantity of objects randomly presented on a display, when they number fewer than around five. While larger quantities require subjects to count or estimate — at great expense of time and accuracy — it seems that a different enumeration method is employed in these low-quantity cases. In 1949, Kaufman, Lord, Reese and Volkmann coined the term 'subitizing' to describe the phenomenon.<ref>Kaufman, E.L., Lord, M.W., Reese, T.W., & Volkmann, J. (1949). The discrimination of visual number. American Journal of Psychology, 62 (4), 498–525.</ref>
 
In 2023 a study of [[Single-neuron recordings|single neuron recordings]] in the [[medial temporal lobe]] of neurosurgical patients judging numbers reported evidence of two separate neural mechanisms with a boundary in [[Neural coding|neuronal coding]] around number 4 that correlates with the behavioural transition from subitizing to estimation, supporting the old observation of Jevons.<ref>{{Cite journal |last=Kutter |first=Esther F. |last2=Dehnen |first2=Gert |last3=Borger |first3=Valeri |last4=Surges |first4=Rainer |last5=Mormann |first5=Florian |last6=Nieder |first6=Andreas |date=2023-10-02 |title=Distinct neuronal representation of small and large numbers in the human medial temporal lobe |url=https://www.nature.com/articles/s41562-023-01709-3 |journal=Nature Human Behaviour |language=en |volume=7 |issue=11 |pages=1998–2007 |doi=10.1038/s41562-023-01709-3 |issn=2397-3374|url-access=subscription }}</ref><ref>{{Cite web |last=Saplakoglu |first=Yasemin |date=9 November 2023 |title=Why the Human Brain Perceives Small Numbers Better |url=https://www.quantamagazine.org/why-the-human-brain-perceives-small-numbers-better-20231109/ |website=[[Quanta Magazine]]}}</ref>
[[Subitizing]] refers to the rapid and accurate enumeration of small numbers of items. Numerous studies (dating back to 1871)<ref>Jevons, W. (1871). The power of numerical discrimination. Nature, 3, 281–282.</ref> have demonstrated that subjects can very quickly and accurately report the quantity of objects randomly presented on a display, when they number fewer than around five. While larger quantities require subjects to count or estimate — at great expense of time and accuracy — it seems that a different enumeration method is employed in these low-quantity cases. In 1949, Kaufman, Lord, Reese and Volkmann coined the term 'subitizing' to describe the phenomenon.<ref>Kaufman, E.L., Lord, M.W., Reese, T.W., & Volkmann, J. (1949). The discrimination of visual number. American Journal of Psychology, 62 (4), 498–525.</ref>
 
'''Experimental setup'''
 
In a typical experiment, subjects are briefly shown (for around 100ms) a screen containing a number of randomly arranged objects. The subjects' task is to report the number of items shown, which can range between one and several hundred per trial.
 
'''Results'''
Line 88 ⟶ 90:
When the number of items to be enumerated is within the subitizing range, each additional item on the display adds around 40–120ms to the total response time. Beyond the subitizing range, each additional item adds 250–350ms to the total response time (so that when the number of items presented is plotted against reaction time, an 'elbow' shaped curve results.) Researchers generally take this to be evidence of there being (at least) two different enumeration methods at work — one for small numbers, and another for larger numbers.<ref name="Trick">Trick. L.M., & Pylyshyn, Z.W. (1993). What enumeration studies can show us about spatial attention: Evidence for limited capacity pre-attentive processing. Journal of Experimental Psychology: Human Perception and Performance. 10, 331-351.</ref>
 
Trick and Pylyshyn (1993) argue that "subitizing can be explained only by virtue of a limited-capacity mechanism that operates after the spatially parallel processes of feature detection and grouping but before the serial processes of spatial attention."<ref name="Trick"/> In other words, by a mechanism such as a FINST.
 
=== Subset selection studies ===
Line 107 ⟶ 109:
 
Burkell and Pylyshyn found that subjects were indeed quicker to identify the target object in the subset feature search condition than they were in the subset conjunction search condition, suggesting that the subsetted objects were successfully prioritised. In other words, the subsets "could, in a number of important ways, be accessed by the visual system as though they were the only items present".<ref name="Pylyshyn94"/> Furthermore, the subsetted objects' particular positions within the display made no difference to subjects' ability to search across them — even when they were distally located.<ref name="Burkell"/> Watson and Humphreys (1997) reported similar findings.<ref>Watson, D.G. and Humphreys, G.W. (1997). Visual marking: prioritizing selection for new objects by top-down attentional inhibition of old objects. Psychological Review. 104, 90–122</ref> These results are consistent with the predictions of visual indexing theory: FINSTs provide a possible mechanism by which the subsets were prioritised.
 
==See also==
*[[Sparse distributed memory]]
 
==References==