Genealogy of Human Language

Here is a proposed ``family tree'' of the existing human languages. I am certainly not a linguist, but perhaps that's why I dare to make these controversial conclusions. I draw on the writings of professional linguists who, understandably, are anxious to avoid the ``crackpot'' label that comes with wild speculation. I have no such qualms.

Genetic Descent vs. Areal Diffusion

English inherits language characteristics from both German and French. Many laymen think English is a sort of hybrid, or perhaps even closer to French. For example:

``Personnes importants enjoient porc et vin dans un grand restaurant.''

English speakers unfamiliar with French can still probably make sense of this sentence easily, although the German translation might be incomprehensible! Is English then closer to French than German?

Anglo-Saxon and Old French came into contact after the Battle of Hastings. Then the important persons enjoying pork and wine in grand restaurants spoke French and that vocabulary survives in English to this day. If you ``enjoie trop de vin'' and end up barfing in the barn with the swine, you'll do it with a more Anglo-Saxon lexicon!

This language hybridization is caused mostly by bilingualism. Women usually teach their babies a language almost identical to that which they learned from their own mothers, but sometimes they substitute their husband's elite language. If you could start with an English speaker today and go back generation by generation, following only paths where babies learned their maternal grandmother's language, you will find the grandmother's grandmother's ... grandmother spoke a Germanic language not a Romantic one. This despite the huge resemblance to French in sentences like ``Personnes importants enjoient porc et vin dans un grand restaurant.'' The most clearcut evidence that English is descended from Germanic is similarity of irregular forms like Good-Better-Best (compare with Germanic Gut-Besser-Beste).

Eventually the Norman conquerors adopted the language of the conquered, and the French borrowings became a superstrate. If instead they had refused to change, the rest of the population would have eventually adopted French, but with an Anglo-Saxon substrate.

Many centuries from now, as irregular forms disappear and vocabularies mutate, it may no longer be obvious whether English came from the Germanic or Romantic family.

This confusion gets worse when the genetic connections are more and more distant. While all Indo-european languages spoken today probably derive from a single language spoken less than 6000 years ago, a common genetic ancestor of aboriginal Australian languages may not have existed for 30,000 years. Yet because of areal diffusion of words, sounds, and grammatical constructs, the Australian languages are regarded as a cohesive language family.

Controversy

Since distant genetic connections are obscured by non-genetic borrowings, few linguists accept any ``grand family tree'' of human language. In Ruhlen's scheme, summarized below, all languages are classified into one of twelve phyla, but only two of these phyla (Kartvelian and Dravidian) are universally agreed to be valid genetic families.

A third family (Australian) is widely accepted, but as a ``Sprachbund'' rather than a genetic family: areal diffusion operating over hundreds of centuries has made the languages similar.

One of the most controversial of Ruhlen's proposals is the Amerind hypothesis -- that all of the native languages of South and Central America (as well as most of North America) are contained in the same genetic family. Contrariwise, most linguists are struck by the diversity within this so-called Amerind family.

It seems neat when linguistic analysis can solve ancient mysteries. One such mystery is the date of human arrival in America. Some say humans first came about 12,000 years ago (or that if there were earlier colonies, they didn't survive); others cite controversial "digs" and make the date twice as remote. Does linguistics provide an answer?

Although Greenberg and Ruhlen, who regard the Amerind languages as one big family, may use that to suggest the more recent date, many linguists take the opposite view, that the great diversity among Amerind languages implies the earlier date.

There is a third view. Professor R.M.W. Dixon, an eminent scholar specializing in Australian and South American languages, who has advocated areal diffusion and equilibrium/punctuation theories, takes ``a viewpoint that is diametrically opposed. [In contrast to Australian, the] fact that so many language families are recognizable [in America] indicates a relatively recent series of language splits.''

Despite these controversies and caveats I now outline Ruhlen's taxonomy.

The Twelve Phyla of Present-Day Human Language

Eminent linguists like Joseph Greenberg believe that all of man's languages are descended from a common ancient language. Their evidence is that words like ``akwa'' (water) and ``dik'' (finger/one/ten) are used by distinct languages all over the world. Different linguists will divide the world's languages into `superfamilies' or `phyla' in different ways, but Ruhlen's proposal happens to give the smallest number of distinct language phyla of any proposed taxonomy. Even if you disagree with Ruhlen's work, the simple alphabetized enumeration (A through L) makes it easy to recall a list of the world's language families.

Boundaries are schematic; map is not to scale.

Among the most controversial of these superfamilies are:

Cauco-sinitic which includes Caucasian (Dagestan, Circassian), Na-Dene, Sino-Tibetan, Yenisei, Burushaski, and prob. Basque.

Eurasiatic which includes Eskimo-Kamchatkan, Gilyak, Koreo-Japanese, Tungus, Mongo-Turkic, Uralo-Yukaghir, Indo-European, and (most controversially) possibly Ainu.

Austric (``Hmong, etc.'') which includes Daic, Austronesian (Formosan, Malayo-Polynesian), Austro-asiatic (Mon-Khmer, Viet, Munda), and Hmong-Mien.

Plausible Genetic Chronology

This A-L mnemonic is used in the following diagram. D, E, F & G are seen to be relatives and may be lumped into a ``Nostratic superphylum.''

Nahali (shown here as an isolated, 13th, phylum ``N ?'') is an ancient language of Central India, sometimes conjectured to be related to any of language families C, D, H, or I.
Here's a schematic showing a plausible (but very approximate) timing of the split-up of proto-Human (the ``mother'' language) into the twelve phyla. Dates are in millennia before present.

Areal vs. Genetic Devlopment

A lot of the information in the above chart is ``certain.'' For example, Australian must have separated from Asian languages about 40 millennia ago: that's when sea levels rose to make further communication impossible. Many details cannot be resolved. Did Phyla A & I come from a single early split-off (``proto Australo-Pacific'') or, as depicted in the diagram, were the two phyla split-off separately from early Asian languages. The answer may be unknowable or irrelevant.

Australian and Indo-Pacific separated from each other, and from other human languages, about 30 - 45 millennia ago at the dawn of the modern human era. Thus we can state a ``ballpark'' date for the separation of phyla A & I even though we don't know where that separation occurred, in Oceania, Asia or even Africa. What about the internal separations among Australian or Indo-Pacific dialects? Over the many millennia, language in Australia doubtless went through many changes and had great diversity. Even if only one surviving dialect arrived from the Asian mainland (a doubtful proposition) it would have diverged quickly, and over the millennia many new languages arose and disappeared. Starting from two of today's most divergent Australian languages, how long ago did they diverge from a common tongue? No one knows for sure, but it was surely more than five or ten millennia ago because the similarities among Australian languages are much less than in a family of relatively recent birth like Semitic or Indo-European. But perhaps there were 2 or 3 Australian phyla and all but one disappeared, say, 15 or 20 millennia ago, due to a temporary domination, much as Indo-European eliminated other European phyla. Professor Dixon, an expert on Australian languages, argues that this is unlikely. There is no evidence of a particular technological innovation to explain such domination. (Indo-European success can be explained by one or more of their innovations like animal husbandry, war chariots, and novel foods.) Also there is no linguistic evidence of a ``lost phylum.''

Thus the common ancestor of two Australian languages might be in the distant past, say 35 millennia ago, while their common ancestor with Indo-Pacific might be at about the same time. Yet linguists agree that the similarity among any two Australian languages is much greater than their similarity with Indo-Pacific. The point is that the two Australian languages have been in continual contact even though their genetic separation may have been 35 millennia ago. Unrelated languages in geographic contact can trade vocabulary, phonology and grammar.

Because of this, it it may not be clear whether a language clading diagram should show just genetic inheritance, or the effects of areal affinity as well; and linguists may be unable to distinguish the two for distant affinities. This is different from the ``Tree of Life'' where genetic inheritance is clearcut: No one thinks bats acquired flying genes from areal contact with birds!

For this and many other reasons, the above tree should not be viewed as ``correct;'' at best it is just a plausible guess.

Further Comments on Clading

``Clading'' diagrams like the proposed language genealogy can be misleading, as suggested by the following four example clade diagrams.

(a) Chimpanzees are closer to humans genetically than they are to gorillas, yet they are classed in the same genus as the latter. The science popularizer Stephen Gould claims that the chimp genus should be reassigned to match the clading chart, but this view ignores the very mechanism of evolution: If we followed Gould's suggestion, the Dinosaurs, Birds and Mammals would all have to be lumped into a single subclass in Fish or Amphibians! (b) Germanic has deviated from other Indo-European languages even though it didn't split off particularly early. This is because it spent many centuries in contact with the entrenched (but now extinct) Ertebolle linguistic tradition in Northern Europe. Speakers of Baltic, on the other hand, have always been surrounded by other Indo-European speakers, and in fact Baltic may be closer to ancient Indo-European than any other modern dialect. (c) Using dates consistent with those shown in the ``Proposed Genealogy,'' but from the extremes of the indicated ranges, a wholly different clading into the families B-D-E-F-L can be obtained. This alternative model may be as likely as the clading shown earlier, though interesting migrations would be implied. (d) This is another plausible version of the linguistic family tree (though drawn just to depict one part of the clading topology), again compatible with the estimated dates in the main diagram above. Note that this branching would lead to a bizarre taxonomy if we followed Gould's suggestion in (a).
In (d), Austric would be divided into several phyla including a Tai-Sinitic with seven of Ruhlen's twelve phyla lumped into a single phylum.
Model (c) here might seem to contradict alleged similarities in B-L or D-E, but the ideas of (a) and (b) show us that it is quite plausible.

An even more serious problem in clading study is the fact that most ancient language families have become extinct. Our ``Possible Genealogy of Human Languages'' thus just depicts languages which by chance had surviving descendants. The Pygmy people of Africa probably had their own language phylum, but it was eradicated, probably during a period of domination by Bantu speakers.

Similarly the above tree showing Eurasiatic splitting from Afro-asiatic about 15 millennia ago is complete guesswork: The period is too ancient to allow any certainty.

We can guess, however. Read my musings on ancient man and his languages.

A-B-C-... mnemonics can be fun. Here's one to memorize the eight classes of English adjectives:

A beautiful costly decade-old ebony-colored French gunmetal heating iron

(Note that the phrase sounds awkward if the adjectives are placed in any other order.)

Go back to James Allen's home page.


(a) Chimpanzees are closer to humans genetically than they are to gorillas, yet they are classed in the same genus as the latter. The science popularizer Stephen Gould claims that the chimp genus should be reassigned to match the clading chart, but this view ignores the very mechanism of evolution: If we followed Gould's suggestion, the Dinosaurs, Birds and Mammals would all have to be lumped into a single subclass in Fish or Amphibians!	(b) Germanic has deviated from other Indo-European languages even though it didn't split off particularly early. This is because it spent many centuries in contact with the entrenched (but now extinct) Ertebolle linguistic tradition in Northern Europe. Speakers of Baltic, on the other hand, have always been surrounded by other Indo-European speakers, and in fact Baltic may be closer to ancient Indo-European than any other modern dialect.	(c) Using dates consistent with those shown in the ``Proposed Genealogy,'' but from the extremes of the indicated ranges, a wholly different clading into the families B-D-E-F-L can be obtained. This alternative model may be as likely as the clading shown earlier, though interesting migrations would be implied.	(d) This is another plausible version of the linguistic family tree (though drawn just to depict one part of the clading topology), again compatible with the estimated dates in the main diagram above. Note that this branching would lead to a bizarre taxonomy if we followed Gould's suggestion in (a).
		In (d), Austric would be divided into several phyla including a Tai-Sinitic with seven of Ruhlen's twelve phyla lumped into a single phylum. Model (c) here might seem to contradict alleged similarities in B-L or D-E, but the ideas of (a) and (b) show us that it is quite plausible.