Joseph Greenberg

Joseph H. Greenberg (May 28, 1915-May 7, 2001) was a prominent but controversial linguist, known for his work in both language classification and typology. He was born in Brooklyn, New York and served for many years on the faculty of Stanford University.

Contributions to linguistics

Language typology

Greenberg's fame rests in part on his seminal contributions to synchronic linguistics and the quest to identify linguistic universals. In the late 1950's, Greenberg began to examine corpora of languages covering a wide geographic and genetic distribution. He located a number of interesting potential universals, as well as many strong cross-linguistic tendencies.

In particular, Greenberg invented the notion of "implicational universal", which takes the form "if a language has structure X, then it must also have structure Y." For example, X might be "mid front rounded vowels" and Y "high front rounded vowels" (for terminology see phonetics). This kind of research was picked up by many other scholars following Greenberg's example and has continued to be an important kind of data-gathering in synchronic linguistics.

African languages

Greenberg is also widely known and respected for his development of a new classification system for African languages, which he published in 1963. The classification was for a time considered very bold and speculative, especially in his proposal of a Nilo-Saharan language family, but is now generally accepted among African historical specialists. In the course of this work, Greenberg coined the term Afroasiatic languages, to replace the former "Hamito-Semitic".

Greenberg's classification was well accepted by historical linguists, who have since used it as the bass for further work. Hal Fleming introduced the Omotic family, and Gregerson proposed the join of Niger-Congo and Nilo-Saharan into a large Kongo-Saharan family, which were in turn accepted by Greenberg.

Languages of the Americas

Later, Greenberg studied the native languages of the Americas, which until then had been classified into hundreds of separate language families. In his 1987 book Language in the Americas, he proposed a broader classification into three major groups: Eskimo-Aleut, Na-Dene, and Amerind.

This work, particularly the Amerind family, is still rejected by many historical linguists. The criticisms are directed not so much to the classification per se, but primarily to the method used to establish it, which most linguists considered too unreliable (see below); and to the large number of errors that were chaimed to be present in the sources used by Greenberg, such as wrong or non-existent words, incorrect translations, words attributed to the wrong languages, and unsupported or wrong identification of prefixes and suffixes.

While some of these errors (which, according to Greenberg's defenders, only affect a few percent of the data) could conceivably lead to an artificial increase in the similarity measure, others would merely introduce random noise in the measurement, and therefore tend to reduce it — which would only strenghten Greenbeg's conclusions. Nevertheless, both the errors in the data and the methodological problems have led many linguists to dismiss this part of Greenberg's work as unscholarly and invalid.

Languages of the World

Greenberg's best known and most controversial work is an ambitious classification system covering most languages of the world. In this grand scheme, he proposed to join many language families of Europe and Asia into a single group called Eurasiatic.

Like his American Indian work, this proposed classification has found few adherents among historical linguists, and has been strongly rejected by many of them — chiefly because of the method used (see below).

Greenberg's method of mass comparison

Traditional language comparison

Since the development of comparative linguistics in the 19th century, a linguist who claims that two languages are related, in the absence of historical evidence, is expected to back up that claim by presenting general rules that describe the differences between their lexicons, morphologies, and grammars. The procedure is described in detail in the Wikipedia article comparative method.

For instance, one could prove that Spanish is related to Italian by showing that many words of the former can be mapped to corresponding words of the latter by a relatively small set of replacement rules — such as change initial es- by s-, final -os by -i, etc. Many similar correspondences exist between the grammars of the two languages. Since those systematic correspondences are extremely unlikely to be random coincidences, the most likely explanation by far is that the two languages have evolved from a single ancestral tongue (Latin, in this case) All pre-historical language groupings that are widely accepted today — such as the Indo-European, Sino-Tibetan, and Bantu families — have been proved in this way.

Limitations of the comparative method

However, besides systematic changes, languages are also subject to random mutations (such as borrowings from other languages, irregular inflections, compounding, and abbreviation) that affect one word at a time, or small subsets of words. For example, Spanish perro, which does not come from Latin, cannot be rule-mapped to its Italian equivalent cane.

As those sporadic changes accumulate, they will increasingly obscure the systematic ones — just as enough dirt and scratches on a photograph will eventually make the face unrecognizable. Given the rate at which those random mutations occur, they are expected to obliterate any systematic similarities between languages that have split off more than 10,000 years ago. Considering that humans probably have been speaking fully developed languages since at least 60,000 years ago (when Australia got populated), it is hardly surprising that many languages and language families still have no known relationship with other groups.

Mass lexicon comparisons

In an effort to extend comparative linguistics beyond its present limits, and arrive at his broad super-family groupings, Greenberg invented a new statistical method, mass lexical comparison. In this method, one simply compares a large sample of words from one language with its equivalents in the other language, looking for similar sound patterns. Thus, for example, Spanish cabeza and Italian capo are similar to the extent that both contain the same consonant sound [k], similar vowel sounds [a], and similar consonants [b], [p], in the same sequence.

Departing from the traditional criterion, Greenberg did not look for any systematic trend in these similarities, trusting that a sufficiently large percentage $S$ of sufficiently similar pairs among the samples would be enough to prove a common origin for the two languages. This assumption is valid in principle, because $S$ is expected to be higher for languages that have split off more recently, and decrease as the split recedes into the past. The difficult part is deciding what constitutes "sufficient" similarity.

Choosing the sample lexicon

Ideally, the sample lexicons should contain only words that are likely to have survived in either language since the time of their hypothetical common origin, and are unlikely to be replaced by borrowed or reinvented words. For studies that extend more than 5000 years into the past, that criterion leaves only a few hundred concepts — such as body parts, close family relations, common animals and plants, water, fire, sky, stone, spear, etc..

Words for "modern" concepts — such as "wine", "horse", and "steel" — may show spurious similarities between unrelated languages, due to the name being imported by a culture together with the thing; e.g. Spanish pan and Japanese pan ("bread"). Alternatively, the names of recently imported concepts may get invented separately in related languages, such as computadora ("computer") in Spanish and ordinateur in French. Either way, such words would only add noise and bias to the comparison.

Weaknesses of the method

In theory, the reliability of Greenberg's method could be settled by statistical analysis; namely, by computing the probability that a given similarity level $S$ could have arisen by chance coincidences between totally unrelated languages. Two languages then should be considered similar only if the observed value of $S$ was significantly greater than this "baseline" level.

Unfortunately, this computation is very difficult to do. For one thing, the similarity level $S$ is expected to depend on the phonetic repertoires of the two languages; thus, for instance, one expects more chance resemblances between two languages that have few vowels and many consonants, than between a vowel-rich and a vowel-poor language. Similar biases can be expected when comparing languages that allow consonant clusters with those that don't, or polysyllabic languages with monosyllabic ones. It follows that deciding what would be a significant level of similarity would require a stochastic model for a "random lexicon" that took into account letter frequencies, syllable structure, and many other similar statistics.

Also, the "ancient" concepts that are most suitable for inclusion in the sample lexicons often have onomatopoeic names that imitate a natural sound associated with the concept. (English is especially rich in such words, e.g. crack, slap, bang, crow, gurgle, cough, etc..) The independent use of that principle in two languages will tend to create similar word pairs, that contribute to the similarity measure $S$ but are not due to common origin.

Finally, in every language the same concept can often be expressed by two or more diferent words; and the meanings of words are known to drift over centuries just as much as their forms. Thus, for example, the meanings of corn and grain in English overlap to a large extent; and corn, which originally referred to cereals like wheat and barley, has come to mean chiefly "maize" in the United States. As a consequence of these semantic shifts and synonymies, the construction of the representative lexicon for a language typically involves many choices that must often be made on subjective criteria. These choices may be unconsciously biased towards words that are similar to those previously chosen for other languages, thus artificially inflating the similarity measure $S$ .

These difficulties are compounded by the fact that many historical linguists are unfamiliar with statistical analysis, and therefore are at a disadvantage when it comes to evaluate or criticize these comparisons. For all these reasons, although Greenberg's method of mass lexical comparison continues to have ardent scholarly defenders, it is still rejected by most historical linguists, who view the comparative method as the only legitimate way to establish pre-historical common ancestry for languages.

References

Adelaar, Willem F. H. (1989) Review of Language in the Americas. Lingua 78.249-255.
Berman, Howard (1992) A Comment on the Yurok and Kalapuya Data in Greenberg's Language in the Americas, International Journal of American Linguistics 58.2.230-233.
Chafe, Wallace (1987) Review of Language in the Americas. Current Anthropology 28.652-653.
Goddard, Ives (1987) Review of Language in the Americas. Current Anthropology 28.656-657.
Greenberg, Joseph H.Linguistics, anthropological theory, cultural anthropology; Africa.
Greenberg, Joseph H. (1963) Some universals of grammar with particular reference to the order of meaningful elements. In Universals of Language. Cambridge: MIT Press. pp. 73–113.
Greenberg, Joseph H. (1987) Language in the Americas. Stanford: Stanford University Press.
Greenberg, Joseph H. (2000) Indo-European and its Closest Relatives: the Eurasiatic Language Family – Volume I, Grammar. Stanford: Stanford University Press.
Greenberg, Joseph H. (2002) Indo-European and its Closest Relatives: the Eurasiatic Language Family – Volume II, Lexicon. Stanford: Stanford University Press.
Kimball, Geoffrey (1992) A critique of Muskogean, `Gulf,' and Yukian materials in Language in the Americas, International Journal of American Linguistics 58: 447-501.
Poser, William J. (1992) The Salinan and Yurumanguí Data in Language in the Americas. International Journal of American Linguistics 58.2.202-229.
Rankin, Robert (1992) Review of Language in the Americas, International Journal of American Linguistics 58.3.324-351.

External links