This page explains in detail the rationale for the language classification scheme outlined here.

By and large, I followed the scheme provided in Ruhlen (1991).

The main principle behind my presentation is that no language should be more than three hierarchical levels away from the top level.

Language families are listed in alphabetical order. The name of the family is in blue.

Where I thought this necessary, I added another hierarchical level, that of a branch. The names of the branches are given under the language family name, bulleted and in normal letters.

The righ-hand column of the table in which the classification is provided includes the names of some of the languages included in each family or branch. This listing is not meant to be exhaustive. Where a family includes some isolates (i.e. individual languages that cannot be classified lower down in the hierarchy), I included their names on the corresponding row. Language isolates that cannot be classified by language family at all are listed in the last row of the table.

I shall now explain some of the divergences from Ruhlen, in the order of the classification itself, except that I shall deal with the native languages of the Americas first..

American Languages in general

Classification of native languages in the Americas is difficult. There is a very large number of languages and of potential language families. In the 20th century there was a succession of efforts aimed at reducing the number of first-level categories, until Greenberg (1987) managed to arrive at just three: Amerind, Na-Dené and Eskimo-Aleut, a proposal incorporated into Ruhlen as well. This has generated an enormous controversy, into which I shall not enter here.

However, it is clearly not reasonable to classify all native "Indian" languages of the western hemisphere into just two groups, while leaving the Old World families mostly intact. In my view, we can either accept something like the Nostratic hypothesis for Eurasia and the Amerind hypothesis for the Americas, or, alternatively, we can accept lower-level families in both areas. It is the latter approach that I intend to follow here.

The sheer number of language families in the Americas is such that a practical classification, such as the one aimed at here, must lump some of the families together. For North America, it would be tempting to simply accept the six phyla proposed by Sapir (1929): Eskimo-Aleut, Algonkian-Wakashan, Na-Dené, Penutian, Hokan-Siouan and Aztec-Tanoan. This would, however, disregard research done since then. I decided, therefore, to accept Ruhlen's Na-Dené and Eskimo-Aleut as separate families in my scheme, and then take his second- or third-level subdivisions of Amerind as the other families, with some provisos (names in bold are subdivisions in Ruhlen, underlined if first-level under Amerind, not underlined if second- or third-level; names in blue are family names in my classification):

Northern Amerind


By Ruhlen's own admission, this is a grab-bag of languages, with little in the way of demonstrated genetic unity. It is best to reduce it to families whose unity is accepted by most specialists (and several of which are recognizable by the general public):

  • Algic - A by-now well established name for Algonquian + the Californian languages Wiyot and Yurok.
  • The three subdivisions of Mosan should be kept separate: Chimakuan, Salish, and Wakashan.
  • Similarly, the three subdivisions of Keresiouan, namely Caddoan, Iroquioan and Siouan-Yuchi, should be kept separate.
  • Finally, Kutenai and Keres should be mentioned as language isolates.


I have no argument with Ruhlen here. Keep the family as is.


The unity of most of Penutian has been accepted for a while now, but Greenberg's additions of the Gulf group (incl. distant Yuki and Wappo in California) cannot be accepted by the logic of this classification. Neither will I include the Mayan group of languages in Penutian, despite Ruhlen's enthusiasm - the EB article (1993: 771) specifically rejects this connection. Therefore the best solution seems to me to keep Penutian with its "original" meaning (Ruhlen's subgroups I-VI), and keep Gulf and Mayan as separate families, the latter to be called Mexican to conform to Ruhlen to this extent.

Central Amerind

Tanoan and Uto-Aztecan are kept separate by Ruhlen. Here, however, it is the EB (and, by implication, Voegelin and Voegelin) which is ready to group these two families together as the Aztec-Tanoan "stock" (EB 1993: 770). So shall I.

Oto-Manguean. I do not propose to enter the quagmire of evidence and counter-evidence for this unit, and for deciding which languages belong to it. I shall simply adopt this as a convenient designation for Central Mexican languages that are neither Aztec-Tanoan nor Maya.

Chibchan-Paezan, Andean, Equatorial-Tucanoan, Ge-Pano-Carib

Here I simply express my admiration that Ruhlen has managed to classify the myriad languages south of Mexico into one of these four units. Even Greenberg has a greater number!

I have neither the background knowledge nor the time to try to disentangle these stocks/phyla, so - for now at any rate - I shall keep them intact.

Other language families


Although still not accepted by some scholars, this group's existence has wide support. I see no reason to reject it, or not to place its three subdivisions at the same hierarchical level. It may be justifiable to establish an intermediate grouping Mongolian-Tungus, but there is no need to do so in this classification.

Placing Ainu, Japanese and Korean in Altaic, on the other hand, is not a good idea. Despite Miller (1971; 1996), placing these languages in the Altaic group is still not universally accepted, and I opt for the conservative solution: Japanese and Korean will be families in their own right (mostly because of the large number of speakers), while Ainu will be listed as a language isolate.


Ruhlen (p.188) and the EB (p.746) agree that the indigenous languages of Australia (but not Tasmania) are all related.


This family, with its two main subdivisions Munda and Mon-Khmer, should not pose any problems. I have, however, eliminated the higher-level group Austric that Ruhlen placed it in, together with Miao-Yao, Daic and Austronesian. The reason is simple: such a wide-ranging Austric grouping has no justification in my view - the proposed genetic relationship is too uncertain.


The unity and composition of this family is not in doubt. Its appartenance to Austric (see more on this below) is problematic, and I do not accept it.

Austronesian is probably the best example of a language family whose genetic subdivision is not really suitable for a practical classification. Some first-level subdivisions may contain just a handful of marginal languages, while others may have a large number of imprtant ones. If we wish to have subdivisons of roughly equal strength, we need to reorganize Ruhlen's scheme as follows:

Atayalic, Tsoulic, and Paiwanic are to be merged as Formosan.

Malayo-Polynesian is to be divided into its two sub-units: Western Malayo-Polynesian can be retained, while Central-Eastern Malayo-Polynesian is to be further subdivided. Once again, one sub-unit, Central Malayo-Polynesian can provide us with a useful subdivision, while Eastern Malayo-Polynesian can be sub-divided into South Halmahera - NW New Guinea, on one hand, and Oceanic, on the other.


Same comment as above.


I have excluded Elamite from this group. I have seen no references to conclusive evidence that Elamite, a language that we know from a few ancient records and that we know very little about, is related to the Dravidian languages. Let's keep it as a language isolate until we know more.


A non-controversial grouping, surely! I see no advantage to calling it Indo-Hittite, and - as can be seen - I have placed Hittite into the Anatolian subdivision, which is at the same level as, e.g. Germanic and Slavic.

As elsewhere on this page, I am not making a statement here about my beliefs as to the chronology of subgrouping within a language family. Anatolian may well have split off from Proto-Indo-European before any of the other branches - however, this is simply not relevant for a pragmatic sub-classification of the family.

I also believe that the well-known branches of Indo-European all deserve their own place in the scheme, and there is no need for intermediate stages, such as Balto-Slavic or Italo-Celtic.

One innovation: I think that the Romance languages also deserve their own subdivision. Romance linguistics is clearly a very different field of study from the study of Latin and its Italic relatives, therefore the Romance languages should be classed separately. (By the same logic, the modern Indo-Aryan languages may have to be transferred to a different category eventually).

Indo-Pacific (or Papuan)

This assemblage of languages, one of the most diverse and least known, may, or may not, form a genetic unity. At this stage, however, there is no need to subdivide it, and certainly no evidence to merge it with another family.


See comments under Austro-Asiatic for the rejection of the Austric grouping.


I accept the first-level subdivision by Ruhlen, i.e. into Kordofanian and Niger-Congo. The further subdivison of Niger-Congo, however, is too unwieldy in Ruhlen, therefore I decided to follow the subdivision provided in the Encyclopaedia Britannica (EB) (1993: Vol.22, p.750-751), itself based on Voegelin & Voegelin (1977). These six subdivisons are included in my classification on the same level as Kordofanian.


My scheme is slightly different from Ruhlen's: I placed Karen and Tibeto-Burman at the same level as Sinitic. This is analogous to what I did elsewhere (Niger-Kordofanian, Austronesian), in order to even out the uneven distribution of languages among sub-branches.


Ruhlen calls this category Uralic-Yukaghir. I think that changing the well-established name of a language family because of the addition of one other language, even if it is supposed to have split off first, is an unnecessary complication. Suppose that, say, Ket is found to be Uralic in the future, are we to rename the family Uralic-Yukaghir-Ket?


