Statistical language acquisition

Statistical language acquisition, a branch of developmental psycho linguistics, is the study of the process by which humans develop the ability to perceive, produce comprehend, and communicate with natural language in all of its aspects ( phonological, syntactic, lexical, morphological, semantic) through the use of general learning mechanisms operating on statistical patterns in the linguistic input.

Philosophy

Fundamental to the study of statistical language acquisition is the centuries-old debate between rationalism (or its modern manifestation in the psycholinguistic community, nativism) and empiricism, with researchers in this field falling strongly in support of the latter category. Nativism is the position that humans are born with innate ___domain-specific knowledge, especially inborn capacities for language learning. Ranging from seventeenth century rationalist philosophers such as Descartes, Spinoza, and Leibniz to contemporary philosophers such as Richard Montague and linguists such as Noam Chomsky, nativists posit an innate learning mechanism with the specific function of language acquisition.^[1]

In modern times, this debate has largely surrounded Chomsky’s support of a Universal Grammar, properties that all natural languages must have, through the controversial postulation of a Language Acquisition Device (LAD), an instinctive mental ‘organ’ responsible for language learning which searches all possible language alternatives and chooses the parameters that best match the learner’s environmental linguistic input. Much of Chomky’s theory is founded on the poverty of the stimulus (POTS) argument, the assertion that a child’s linguistic data is so limited and corrupted that learning language from this data alone is impossible. As an example, many proponents of POTS claim that because children are never exposed to negative evidence, that is, information about what phrases are ungrammatical, the language structure they learn would not resemble that of correct speech without a language-specific learning mechanism.^[2]

Standing in stark contrast to this position is empiricism, the epistemological theory that all knowledge comes from sensory experience. This school of thought often characterizes the nascent mind as a tabula rasa, or blank slate, and can in many ways be associated with the nurture perspective of the " nature vs. nurture debate”. This viewpoint has a long historical tradition that parallels that of rationalism, beginning with seventeenth century empiricist philosophers such as Locke, Bacon, Hobbes, and, in the following century, Hume. The basic tenet of empiricism is that information in the environment is structured enough that its patterns are both detectable and extractable by ___domain-general learning mechanisms.^[1] In terms of language acquisition, these patterns can be either linguistic or social in nature.

Experimental Paradigms

Headturn Preference Procedure (HPP)

One of the most used experimental paradigms in investigations of infants’ capacities for statistical language acquisition is the Headturn Preference Procedure (HPP), developed by Stanford psychologist Anne Fernald in 1985 to study infants’ preferences for prototypical child-directed speech over normal adult speech.^[3] In the classic HPP paradigm, infants are allowed to freely turn their heads and are seated between two speakers with mounted lights. The light of either the right or left speaker then flashes as that speaker provides some type of audial or linguistic input stimulus to the infant. Reliable orientation to a given side is taken to be an indication of a preference for the input associated with that side’s speaker. This paradigm has since become increasingly important in the study of infant speech perception, especially for input at levels higher than syllable chunks, though with some modifications, including using the listening times instead of the side preference as the relevant dependent measure.^[4]

Conditioned Headturn Procedure

Similar to HPP, the Conditioned Headturn Procedure also makes use of an infant’s differential preference for a given side as an indication of a preference for, or more often a familiarity with, the input or speech associated with that side. Used in studies of prosodic boundary markers by Gout et. al (2004)^[4] and later by Werker in her classic studies of categorical perception of native-language phonemes^[5], infants are conditioned by some attractive image or display to look in one of two directions every time a certain input is heard, a whole word in Gout’s case and a single phonemic syllable in Werker’s. After the conditioning, new or more complex input is then presented to the infant, and their ability to detect the earlier target word or distinguish the input of the two trials is observed by whether they turn their head in expectation of the conditioned display or not.

Anticipatory Eye Movement

While HPP and the Conditioned Headturn Procedure allow for observations of behavioral responses to stimuli and after the fact inferences about what the subject’s expectations must have been to motivate this behavior, the Anticipatory Eye Movement paradigm allows researchers to directly observe a subject’s expectations before the event occurs. By tracking subjects’ eye movements researchers have been able to investigate infant decision-making and the ways in which infants encode and act on probabilistic knowledge to make predictions about their environments.^[6] This paradigm also offers the advantage of comparing differences in eye movement behavior across a wider range of ages than others.

Artificial Languages

Artificial languages, that is, small-scale languages that typically have an extremely limited vocabulary and simplified grammar rules, are a commonly used paradigm for psycholinguistic researchers. Artificial languages allow researchers to isolate variables of interest and wield a greater degree of control over the input the subject will receive. Unfortunately, the overly simplified nature of these languages and the absence of a number of phenomena common to all human natural languages such as rhythm, pitch changes, and sequential regularities raise questions of external validity for any findings obtained using this paradigm, even after attempts have been made to increase the complexity and richness of the languages used.^[7]

As such, artificial language experiments are typically conducted to explore what the relevant linguistic variables are, what sources of information infants are able to use and when, and how researchers can go about modeling the learning and acquisition process.^[4] Aslin and Newport, for example, have used artificial languages to explore what features of linguistic input make certain patterns salient and easily detectable by infants, allowing them to easily contrast the detection of syllable repetition with that of word-final syllables and make conclusions about the conditions under which either feature is recognized as important.^[8]

Important Findings

Phonetic Category Learning

The first step in developing knowledge of a system as complex as natural language is learning to distinguish the important language-specific classes of sounds, called phonemes, that distinguish meaning between words. UBC psychologist Janet Werker, since her influential series of experiments in the 1980s, has been one of the most prominent figures in the effort to understand the process by which human babies develop these phonological distinctions. While adults who speak different languages are unable to distinguish meaningful sound differences in other languages that do not delineate different meanings in their own, babies are born with the ability to universally distinguish all speech sounds. Werker’s work has shown that while infants at six to eight months are still able to perceive the difference between certain Hindi and English consonants, they have completely lost this ability by 11 to 13 months.^[5]

It is now commonly accepted that children use some form of perceptual distributional learning, by which categories are discovered by clumping similar instances of an input stimulus, to form phonetic categories early in life.^[4] Interestingly, developing children have been found to be effective judges of linguistic authority, screening the input they model their language on by shifting their attention less to speakers who mispronounce words.^[4]

Parsing

Parsing is the process by which a continuous speech stream is segmented into its discrete meaningful units, e.g. sentences, words, and syllables. Saffran (1996) represents a singularly seminal study in this line of research. Infants were presented with two minutes of continuous speech of an artificial language from a computerized voice to remove any interference from extraneous variables such as prosody or intonation. After this presentation, infants were able to distinguish words from nonwords, as measured by longer looking times in the second case.^[9]

An important concept in understanding these results is that of transitional probability, the likelihood of an element, in this case a syllable, following or preceding another element. In this experiment, syllables that went together in words has a much higher transitional probability than did syllables at word boundaries that just happened to be adjacent.^[4]^[7]^[9] Incredibly, infants, after a short two minute presentation, were able to keep track of these statistics and recognize high probability words. Further research has since replicated these results with natural languages unfamiliar to infants, indicating that learning infants also keep track of the direction (forward or backward) of the transitional probabilities.^[7]

The development of syllable-ordering biases is an important step along the way to full language development. The ability to categorize syllables and group together frequently co-occurring sequences may be critical in the development of a protolexicon, a set of common language-specific word templates based on characteristic patterns in the words an infant hears. The development of this protolexicon may in turn allow for the recognition of new types of patterns, e.g. the high frequency of word-initially stressed consonants in English, which would allow infants to further parse words by recognizing common prosodic phrasings as autonomous linguistic units, restarting the dynamic cycle of word and language learning.^[4]

Referent-Label Associations

The question of how novice language-users are capable of associating learned labels with the appropriate referent, the person or object in the environment which the label names, has been at the heart of philosophical considerations of language and meaning from Plato to Quine to Hofstadter.^[10] This problem, that of finding some solid relationship between word and object, of finding a word’s meaning without succumbing to an infinite recursion of dictionary look-up, is known as the symbol grounding problem.^[11]

Researchers have shown that this problem is intimately linked with the ability to parse language, and that those words that are easy to segment due to their high transitional probabilities are also easier to map to an appropriate referent.^[7] This serves as further evidence of the developmental progression of language acquisition, with children requiring an understanding of the sound distributions of natural languages to form phonetic categories, parse words based on these categories, and then use these parses to map them to objects as labels.

The developmentally earliest understanding of word to referent associations have been reported at six months old, with infants comprehending the words ‘ mommy’ and ‘[[father | daddy}}’ or their familial or cultural equivalents. Further studies have shown that infants quickly develop in this capacity and by seven months are capable of learning associations between moving images and nonsense words and syllables.^[4]

It is important to note that there is a distinction, often confounded in acquisition research, between mapping a label to a specific instance or individual and mapping a label to an entire class of objects. This latter process is sometimes referred to as generalization or rule learning. Research has shown that if input is encoded in terms of perceptually salient dimensions rather than specific details and if patterns in the input indicate that a number of objects are named interchangeably in the same context, a language learner will be much more likely to generalize that name to every instance with the relevant features. This tendency is heavily dependent on the consistency of context clues and the degree to which word contexts overlap in the input.^[8] These differences are furthermore linked to the well-known patterns of under and overgeneralization in infant word learning.

The ability to appropriately generalize to whole classes of yet unseen words, coupled with the abilities to parse continuous speech and keep track of word-ordering regularities, may be the critical skills necessary to develop proficiency with and knowledge of syntax and grammar.^[4]

Computational Models

Computational models have long been used to explore the mechanisms by which language learners process and manipulate linguistic information. Models of this type allow researchers to systematically control important learning variables that are often times difficult to manipulate at all in human participants.^[12]

Associative Models

Associative neural network models of language acquisition are one of the oldest types of cognitive model, using distributed representations and changes in the weights of the connections between the nodes that make up these representations to simulate learning in a manner reminiscent of the plasticity-based neuronal reorganization that forms the basis of human learning and memory.^[13] Associative models represent a break with classical cognitive models, characterized by discrete and context-free symbols, in favor of a dynamical systems approach to language better capable of handling temporal considerations.^[14]

A precursor to this approach, and one of the first model types to account for the dimension of time in linguistic comprehension and production was Elman’s simple recurrent network (SRN). By making use of a feedback network to represent the system’s past states, SRNs were able in a word-prediction task to cluster input into self-organized grammatical categories based solely on statistical co-occurrence patterns.^[14]^[15]

Early successes such as these paved the way for dynamical systems research into linguistic acquisition, answering many questions about early linguistic development but leaving many others unanswered, such as how these statistically acquired lexemes are represented.^[14] Of particular importance in recent research has been the effort to understand the dynamic interaction of learning (e.g. language-based) and learner (e.g. speaker-based) variables in lexical organization and competition in bilinguals.^[12] In the ceaseless effort to move toward more psychologically realistic models, many researchers have turned to a subset of associative models, Self-Organizing Maps (SOMs), as established, cognitively plausible models of language development.^[16]^[17]

SOMs have been helpful to researchers in identifying and investigating the constraints and variables of interest in a number of acquisition processes, and in exploring the consequences of these findings on linguistic and cognitive theories. By identifying working memory as an important constraint both for language learners and for current computational models, researchers have been able to show that manipulation of this variable allows for syntactic bootstrapping, drawing not just categorical but actual content meaning from words’ positional co-occurrence in sentences.^[18]

Probabilistic Models

Some recent models of language acquisition have centered around methods of Bayesian Inference to account for infants’ abilities to appropriately parse streams of speech and acquire word meanings. Models of this type rely heavily on the notion of conditional probability (the probability of A given B), in line with findings concerning infants’ use of transitional probabilities of words and syllables to learn words.^[9]

Models that make use of these probabilistic methods have been able to merge the previously dichotomous language acquisition perspectives of social theories that emphasize the importance of learning speaker intentions and statistical and associative theories that rely on cross-situational contexts into a single joint-inference problem. This approach has led to important results in explaining acquisition phenomena such as mutual exclusivity, one-trial learning or fast mapping, and the use of social intentions.^[19]

While these results seem to be robust, studies concerning these models’ abilities to handle more complex situations such as multiple referent to single label mapping, multiple label to single referent mapping, and bilingual language acquisition in comparison to associative models’ successes in these areas have yet to be explored. Hope remains, though, that these model types may be merged to provide a comprehensive account of language acquisition.^[20]

References

^ ^a ^b Russell, J. (2004). What is Language Development?: Rationalist, Empiricist, and Pragmatist Approaches to the Acquisition of Syntax. Oxford University Press.
^ Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.
^ Fernald, A. (1985). Four-Month-Old Infants Prefer to Listen to Motherese ”. Infant Behavior and Development, 181-195.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Swingley, D. (2009). Contributions of infant word learning to language development. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 364(1536), 3617-32. doi:10.1098/rstb.2009.0107
^ ^a ^b Werker, J. F., & Lalonde, C. E. (1988). Cross-Language Speech Perception : Initial Capabilities and Developmental Change. Developmental Psychology, 24(5), 672-683.
^ Davis, S. J., Newport, E. L., & Aslin, R. N. (2009). Probability-matching in 10-month-old infants. Methods.
^ ^a ^b ^c ^d Hay, J. F., Pelucchi, B., Estes, K. G., & Saffran, J. R. (2011). Linking sounds to meanings: Infant statistical learning in a natural language. Cognitive psychology, 63(2), 93-106. doi:10.1016/j.cogpsych.2011.06.002
^ ^a ^b Aslin, R. N., & Newport, E. L. (2012). Statistical Learning : From Acquiring Specific Items to Forming General Rules. Distribution. doi:10.1177/0963721412436806
^ ^a ^b ^c *Saffran, J. R., Aslin, R. N., & Newport, E. L. (2012). Statistical Learning by 8-Month-Old Infants. Advancement Of Science, 274(5294), 1926-1928.
^ Bornstein, M.H., & Lamb, M.E. (Eds.). (2011). Developmental Science: An Advanced Textbook. New York, NY: Psychology Press
^ Harnad, S. (1990). The Symbol Grounding Problem. Physica D: Nonlinear Phenomena, 42, 335-346.
^ ^a ^b Zinszer, B. & Li, P. (2010). A SOM model of first language lexical attrition. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 2787-2792). Austin, TX: Cognitive Science Society.
^ Seidenberg, M. S., & Mcclelland, J. L. (1989). A Distributed, Developmental Model of Word Recognition and Naming. Psychological Review, 96(4), 523-568.
^ ^a ^b ^c Li, P. (2009). Lexical organization and competition in first and second languages: computational and neural mechanisms. Cognitive science, 33(4), 629-64. doi:10.1111/j.1551-6709.2009.01028.x
^ Elman, J. L. (1975). Language as a dynamical system. Most.
^ Kohonen, T. (n.d.). The Self-Organizing Map.
^ Zhao, X., Li, P., & Kohonen, T. (2011). Contextual self-organizing map: software for constructing semantic representations. Behavior research methods, 43(1), 77-88. doi:10.3758/s13428-010-0042-z
^ Li, P., Burgess, C., & Lund, K. (2000). The Acquisition of Word Meaning through Global Lexical Co-occurrences. Young Children.
^ Frank, M. C., Goodman, N. D., & Tenenbaum, J. B. (2009). Using Speakers ’ Referential Intentions to Model Early Cross-Situational Word Learning. Psychological Science, 1-8.
^ Griffiths, T. L., Chater, N., Kemp, C., Perfors, A., & Tenenbaum, J. B. (2010). Probabilistic models of cognition : exploring representations and inductive biases. Trends in Cognitive Sciences, 14(8), 357-364. Elsevier Ltd. doi:10.1016/j.tics.2010.05.004

[philo-1] Russell, J. (2004). What is Language Development?: Rationalist, Empiricist, and Pragmatist Approaches to the Acquisition of Syntax. Oxford University Press.

[2] Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.

[3] Fernald, A. (1985). Four-Month-Old Infants Prefer to Listen to Motherese ”. Infant Behavior and Development, 181-195.

[review-4] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Swingley, D. (2009). Contributions of infant word learning to language development. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 364(1536), 3617-32. doi:10.1098/rstb.2009.0107

[werker-5] Werker, J. F., & Lalonde, C. E. (1988). Cross-Language Speech Perception : Initial Capabilities and Developmental Change. Developmental Psychology, 24(5), 672-683.

[6] Davis, S. J., Newport, E. L., & Aslin, R. N. (2009). Probability-matching in 10-month-old infants. Methods.

[soundtomeaning-7] Hay, J. F., Pelucchi, B., Estes, K. G., & Saffran, J. R. (2011). Linking sounds to meanings: Infant statistical learning in a natural language. Cognitive psychology, 63(2), 93-106. doi:10.1016/j.cogpsych.2011.06.002

[general-8] Aslin, R. N., & Newport, E. L. (2012). Statistical Learning : From Acquiring Specific Items to Forming General Rules. Distribution. doi:10.1177/0963721412436806

[saffran-9] *Saffran, J. R., Aslin, R. N., & Newport, E. L. (2012). Statistical Learning by 8-Month-Old Infants. Advancement Of Science, 274(5294), 1926-1928.

[10] Bornstein, M.H., & Lamb, M.E. (Eds.). (2011). Developmental Science: An Advanced Textbook. New York, NY: Psychology Press

[11] Harnad, S. (1990). The Symbol Grounding Problem. Physica D: Nonlinear Phenomena, 42, 335-346.

[zins-12] Zinszer, B. & Li, P. (2010). A SOM model of first language lexical attrition. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 2787-2792). Austin, TX: Cognitive Science Society.

[13] Seidenberg, M. S., & Mcclelland, J. L. (1989). A Distributed, Developmental Model of Word Recognition and Naming. Psychological Review, 96(4), 523-568.

[lexorg-14] Li, P. (2009). Lexical organization and competition in first and second languages: computational and neural mechanisms. Cognitive science, 33(4), 629-64. doi:10.1111/j.1551-6709.2009.01028.x

[srn-15] Elman, J. L. (1975). Language as a dynamical system. Most.

[16] Kohonen, T. (n.d.). The Self-Organizing Map.

[17] Zhao, X., Li, P., & Kohonen, T. (2011). Contextual self-organizing map: software for constructing semantic representations. Behavior research methods, 43(1), 77-88. doi:10.3758/s13428-010-0042-z

[18] Li, P., Burgess, C., & Lund, K. (2000). The Acquisition of Word Meaning through Global Lexical Co-occurrences. Young Children.

[19] Frank, M. C., Goodman, N. D., & Tenenbaum, J. B. (2009). Using Speakers ’ Referential Intentions to Model Early Cross-Situational Word Learning. Psychological Science, 1-8.

[20] Griffiths, T. L., Chater, N., Kemp, C., Perfors, A., & Tenenbaum, J. B. (2010). Probabilistic models of cognition : exploring representations and inductive biases. Trends in Cognitive Sciences, 14(8), 357-364. Elsevier Ltd. doi:10.1016/j.tics.2010.05.004

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]