On the Centum Features of Thraco-Dacian Language

There has been this long time dispute in Romania (mostly since the end of 19th century, but especially from the 1990s of the last century) between the supporters of the Latin origin and those of the Thraco-Dacian origin of the Romanian language. Although I am a convinced supporter of the Thraco-Dacian origin of the Romanian language, I do not consider myself a part of any of the two groups, since both are misleading in one way or the other. The Latinists are misleading in exaggerating the contribution of the Latin to the formation of the Romanian language, whereas the Dacicists are wrong in operating in a sort of self-made linguistic maize so every one is bumping the plains in their own way. I have been concerned with the problem of the origin of the Romanian language since I was a university student. My intuitions were materialized much later, after long researches, in a number of books and papers, including an Etymological Dictionary of Romanian Language based on Indo-European Studies, since I realized that the Latin hypothesis of the origin of the Romanian language is an extremely limited perspective which leaves too many questions unanswered. In short, Latin is only one language in a wider group of related languages, a fact which was not understood for a long time and still misunderstood not only by Romanian linguists, but among many other researchers of Romanace languages. The fact is interesting and deserves to be discussed in more details, but the purpose of this article is to show that, despite some appearances, Thraco-Dacian was a centum language related to the Italic languages, including Latin and not a strange very little known satem language. According to my data (see DELR) a rather small number of lexical items have correspondents in Latin (around 14%, most of them cognates of Latin), but much larger number of 62% have cognates in other Indo-Euroepan languages (others than Latin) and can be connected to various Proto-Indo-European roots. The rest are either imitative formations (6%), loanwords from various languages (around 12%) and of uncertain origin (6%). Many of the loanwords are marginal lexical itmes and most of them with no derivatives.

The idea that Thraco-Dacian belongs to the satem group appeared in the 19th century due to poor knowledge of this language and the lack of interest in Romanian as a possible repository of Thraco-Dacian words and their phonological features. I mention that  Thracian, Illyrian and Dacian are different names for the same language. Furthermore, some linguists consider Illyrian a centum language, while Thracian (and Dacian) as satem. All this is a non-sense since both these names define one language. There is no doubt that there may have been dialectal differences, but in the early Middle Age there was only one language spoken from the Adriatic Sea to the Black Sea and further to the East:  Proto-Romanian. It must have had one single ancestor, before the invasions of Slavs and the Magyars in the last centuries of the 1st millenium AD, regardless of whether we take into consideration a process of Romanization or not. Even today there are pockets of speakers of Romanian all over the Balkan region, as wll as east and north of Romania and the Republic of Moldova, in Poland and Ukraine. Suidas (10th Century AD) defines the Illyrians as Barbarian Thracians (Illyrioi barbaroi Thrakoi), while Strabo (1st century AD) shows that Thracians and Illyrians spoke the same language. For the sake of simplicity, in this article I will use the term Thraco-Dacian instead of Thraco-Illyro-Dacian. There are a lot of controversial hypotheses based on dubious evidence of insufficiently corroborated data regarding Illyrian, Thracian and Dacian, but I will not discuss them here. Instead, I will draw the evidence only from the lexicon of contemporary Romanian.


The 19th century Indo-Europeanists deduced that Indo-European languages can be divided into two main groups, namely the centum group, located in the western Indo-European linguistic area and the satem group located in the eastern part of this area. The terms defines the number hundred (100) in Latin (centum) and respectively in Avestic (satem) as it is attested in Zoroastrian religious scriptures.

Unfortunately, in Romanian linguistics there is no interest in this subject, not even at the highest levels. Although the opinions of different linguists regarding the nature and the number of Proto-Indo-European velars varied over the years, one may say the Proto-Indo-European language had three unvoiced velars: *k, *k‘, *kᵂ and  and six voiced velars. Three unaspirated: *g, *gᵂ, *g‘ and three aspirated: *gh, *gᵂh, *gh. Of these, we are particularly interested in the palatal velars *k‘, *g‘, *g’h. The palatal velars turned into simple velars in Thraco-Dacian as in any other centum language, while *kᵂ and *gᵂ had a very interesting evolution in Thraco-Dacian, similar to that of the Continental Celtic languages, Osco-Umbrian and northern Greek dialects. Namely, the labio-velars turned into bi-labials after a back vowel (a, o) and turned into an affricate (or sibilant) after a front vowel (e, i), while in Osco-Umbrian and Continental Celtic languages all labio-velars turned into bi-labials regardless of the phonogical environment.  Details are discussed in the Introduction to the EDRL.

Returning to the first two dialects of Indo-European language, the sounds *k‘ and *g‘ may turned into affricates first, then into sibilants (š, ś or s) in the Eastern dialect from which the Indo-Aryan, Slavic, Baltic, as well as Armenian originate from. Many linguists include Albanian into the satem group, but Albanian is rather centum than satem, so the whole thing is a bit ambiguous in this respect. A thorough study of the Albanian language in this regard would be of particular interest, but it seems to me that no one has done it so far. On the other hand, there is some evidence that Albanian and Armenian show a separate treatment of all three rows of the Proto-Indo-European velars. In Albanian, Proto-Indo-European *kᵂ  and gᵂ are different form *k, *g before a front vowel and same rule applies to Armenian. It seems to be a parallel development in the two languages. In other words, these two languages followed their own way, different from the satem group as well as from the centum group.  In my previous works, I showed that Albanian is not a descendent of a main Illyrian dialect, but it is a descendent of Epyrotic dialects which were different since ancient times (see EDRL, Introduction).

In the Proto-Indo-European western dialect, the palatal velars *k‘ and *g‘ colapsed with their non-palatal counter-parts *k and *g. To the centum group belong Italic (including Latin), Greek, Celtic and Germanic languages. According to the data presented here, the Traco-Daco-Illyrian language must also be included here. Although Tocharian was spoken somewhere in today’s western China, it was a centum language. It seems that Tocharians migrated from the centum area, after the first centum/satem division, perhaps somewhere from central Europe. Some also include the Hittite to the centum group, but in fact its situation is a little bit more complicated. The Hittite belongs to the Anatolian group along with Lydian, Luwian and other languages ​​spoken in Anatolia. Today, more and more linguists believe that Proto-Anatolian was a sister of Proto-Indo-European, not its daughter. I think this is the correct position regarding the Anatolian languages. There is more evidence in this respect, but I will just mention here that Lydian preserved the Proto-Indo-European velars, as presented above, being more conservative in this respect than the Hittite, while Hittite, followed the same process as in the Indo-European centum group, but completely independently.


The forms for ‘dog’ in most Indo-European languages are derived from PIE *kuon-, *kun– ‘dog’ (IEW, 632). Romanian noun câine ‘dog’ has a correspondent form in Latin and is therefore considered to be of Latin origin. However, it is attested in the Dacian plant name kinoboula (or kinoboila), found in Dioscorides, and in Apuleius the same plant appears as cinubula (or dinupula), a plant associated with the dog’s sexual organ. There are a number of forms of this kind in today’s Romanian language, forms that I have discussed in EDRL (see câine in EDRL, 200). As one can see, in the first part of this compound word kino- which is considered to mean ‘dog’ we have a velar (k) and not a sibilant as it appears in satem languages: cf. Armenian šun ‘dog’, Old Prussian sunis ‘id’, Mid. Persian sak ‘id’, Russian suka ‘she-dog’,  etc. In Albanian we have qen for male dog, a centum form, but shakë for female dog, a satem form. It is an interesting phenomenon, but we will not discuss it here. Linguists, of course, are saying that qen is of Latin origin, while shakë is inherited from Proto-Albanian. However, Albanian has other words which have centum features and cannot be of Latin origin (see bellow). In addition, the Albanian is situated in the centum area.

In the case of the Romanian words of this category, it is difficult to ‘prove’ that they are of Dacian origin when they have a Latin correspondent due to the deeply rooted misconception that Romanian is a daughter language of Latin, but also because of the mistaken hypothesis that Traco-Dacian was a satem language, much different from Latin. Although, there is a number of  such words in Romanian that do not have Latin correspondents, they clearly exhibit centum features and therefore they are a living testimony of the centum nature of the Thraco-Dacian language.

Sometimes there are whole families of Romanian words originating from the same Proto-Indo-European root, all of them having  centum features.


This is the case with PIE *kes– ‘to cut’, with the nominal derivative *kestrom ‘knife cutter’ (IEW, 586). Romanian nouns cosor ‘pruning knife’, coasă  ‘scythe’ are derived from the verbal root, while custură ‘knife blade, rudimentary knife’, a cresta ‘to notch, to dent, to wound’ and creastă ‘crest, ridge’ are derived from the reconstructed nominal form. From this root, more precisely from the nominal form is derived Latin castro, castrare ‘to castrate’ (cf. IEW, 586; de Vaan, 97) which comes close the the Romanian verb a cresta. In all Romanian etymological dictionaries the first three are considered to be Slavic loawords, and the fourth is considered to be of Latin, as a derivative of the noun creastă, but such an hypothesis cannot be correct. If there is a genetic link between the noun creastă and  the verb a cresta, then creastă is derived from the verb a cresta and not the other way round. It is a legitimate question to ask how can these words be considered to be of Slavic origin, since they have centum features while Slavic languages ​​are satem languages? Therefore, due to the lack of elementary knowledge of Indo-European linguistics, such aberrations become possible and nobody seems to care. Even Julius Pokorny, when discussing this Proto-Indo-European root, is wondering why Slavic kosa ‘scythe’ displays a velar, not a sibilant as it should have been in a Slavic language ​​(‘k-statt s-durch Dissimil. Gegen das folgende s?’  ‘k instead s  by dissimilation from the next s?’). Of course, the great linguist did not think of a possible Proto-Slavic loanword from some centum Indo-European, as we do.


Romanian noun crai ‘king’ is also considered to be of Slavic origin, namely from OCS  kral‘  ‘king’,  in its turn from Germanic personal name Karl, in reference to Charlamagne. In my oppinion this  is a bizzarre hypothesis, accepted by Slavicists for lack of some better explanation.

In EDRL, I have shown that Romanian crai is derived from PIE *krei – ‘to be in front, to excel’ (IEW, 618). The meaning of today’s Romanian crai comes very close the Homeric Greek kreion, found only in the Iliad, being a poetic form, as in the expression kreion Agamemnon ‘king Agamemnon’. At Homer, it is also found only once the feminine form kreiousa ‘queen’ (Iliad 22, 48: cf. Liddel, 993), referring to one of Priam’s wives, which is a strong argument that the form may be of Trojan or Thracian origin, a form that is also almost identical to today’s Romanian crăiasă ‘queen’. These forms are not found in any of the Classical Greek texts, as I have said, with the exception of the Dorian kreioisa (Theoc., cf Liddel, 993). Given this information, one may say that Greek forms are loanwords from Thracian dialects.

Robert Beekes (EDG, 1, 774) shows that the Greek form is inherited from the Indo-European poetic language, which is correct, but does not suggest that Greek may have borrowed it from another Indo-European language, since he did have any knowledge about Romanian data I just presented here. On the other hand, I have no doubt that the Thracians who participated in the Trojan war, along Trojans, had their own version of Iliad of what happened in Troy. Moreover, in today’s Romanian, the  nouns crai and crăiasă are poetic forms as well, being found in fairy tales, in allegorical ballads, but also to the great modern poets like Mihai Eminescu. In other words, one may say Romanian language inherited these words from the Proto-Indo-European poetic language as well.

The relationship with the Slavic and Hungarian forms ​​is not clear to me. They may be loanwords from Romanian with an epenthetic l, as in the case of boier ‘aristocrat’ > OCS boljar ‘id’ (or toiag ‘stuff’ > OCS toljag ‘id’) or from Germanic personal name Karl, as most Slavicists believe (see DELR, crai).

Another group of Romanian words, namely colibă ‘cottage’, cuib (cuibar) ‘nest’,  călțun ‘winter sock’ and probably șoric, șor or cioric ‘(fried) pork skin’ are derived from PIE *kel– ‘to cover’, and nominal forms *kolia, *k’elos ‘roof, cover’ (IEW, 553).

Romanian noun colibă is considered by the old school of Romanian linguistics to be of Bulgarian origin since Miklosich (19th century), but this term is derived in fact from this Indo-European root and inherited from Thraco-Dacian, not a loanwords from Old Church Slavonic or Bulgarian, since it has centum features. The term is found in all Balkan languages, as well as in Turkish, Hungarian and Ukrainian. Therefore, the forms of all these languages cannot be loanwords from Bulgarian, because the Bulgarian was not in any contact with Hungarian and Ukrainian, but the Romanian did and does. On the other hand, in Pausanias (Description of Greece) (2nd centuy, AD) is found the place name Kolibe, located somewhere in northern Greece, and therefore in a Thracian area. Such a form does not exist in ancient Greek. In contrast, the equivalent form in Greek is kalia ‘hut, nest’, a cognate of the Romanian forms. Not to mention that the first Slavs arrived in the Balkan region several hundred years after Pausanias wrote his book.

It is believed that the noun cuib ‘nest’ comes from the unattested Latin *cubium < cubere ‘to lie down’. If there was a bit of truth in this hypothesis, we would have had in Romanian *cub or *cubiu. In addition, the one who delivered this hypothesis (Cihac, 19th century) and those who followed him ignored the forms of the Romanian Balkan dialects spoken in Bulgaria, Greece, Albania and parts of former Yugoslavia: cf. Aromanian cul’bu, Megleno-Romanian, Istro-Romanian cul’b which cannot be explained by this presumptive Latin etymon. These data disprove the hypothesis in question. So both the form and the sense of Romanian cuib indicates a completely different origin, which refers to one of the above-mentioned nominal forms, with a additional b sound, *kulibu, as in the case of colibă.

The Romanian căiță ‘bonnet’ is considered to be a Serbo-Croatian loanword, a hypothesis issued by Cihac and taken over by all the other authors of etymological dictionaries. However, I did not find such a form in the Serbo-Croatian. This Romanian noun is derived from the nominal form of the root, which has been suffixed with –ita, thus *kalita with a later palatalization and the subsequent disappearance of the lateral l, as in the case of the cuib.

The nouns șoric ‘(fried) pork skin’, șor ‘id’ or cioric ‘id’ (attested in Republic of Moldova) seem to be derived from the nominal form *kelos. The affricate ș (sh) can be explained by the fact that the velar is followed by a front vowel, a development which took place in Thraco-Dacian. It is a phonological phenomenon that appeared in Thraco-Dacian and it is found to all velars (and dentals as well) regardless of their status in Proto-Indo-European. This state of affairs has been inherited in Romanian. This kind of palatalization has made linguists to consider the Thraco-Dacian  language as a satem language, due to a poor understanding of the nature of this language. This error I hope will be finally corrected by the  Romanian examples, discussed in this article. The lateral l turned into r as a result of rhotacism (l > r), another fenomenon frequently found in Romanian. The Romanian Explicative Dictionary (DEX) and other Romanian etymological dictionaries consider these forms to be of unknown origin.

From PIE *kerdho-, *kerdha ‘herd, flock’ (IEW, 579) there are in Romanian three different forms with slightly different meanings, dialectal variants: cârd ‘flock (of sheep), flight (dial.))’, ciurdă ‘herd (of cattle), crowd’ and cireadă ‘herd (of cattle)’. It seems to me that cârd is derived from *kerdho-, of neuter gender in Romanian, while the other two come form *kerdha, both of feminine gender in Romanian and presumably in Proto-Indo-European as well, judging by the a-suffix. One may notice they have slightly different meanings and also they may originally belong to different dialects of older Romanian (Thraco-Dacian), but they are mostly understood and used by all native speakers of Romanian.

In the case of the form cârd, it is obvious that palatal velar *k has turned into  the regular  velara k. It is unnecessary to insist that the old hypothesis which claims that Romanian cârd to be a loanword  from Serbo-Croatian is completely absurd and as such must be eliminated like many others of its kind. The form ciurdă is  derived from the root form *kerdha, but a palatalization of k followed by a front vowel, as  I  have shown above. It cannot be of Slavic origin since there was no metathesis of the lateral r as in as in OCS čreda, the inherited form of the Slavic languages from the same Indo-European root. The metathesis of lateral sounds took place back in Proto-Slavic language and it is present in all Slavic languages. One may not exclude the fact that the form cireadă may have emerged under the influence of some Slavic dialect, but this remains a simple hypothesis. The details are not discussed here. It is also posible that the form has entered Proto-Slavic as a loanword from Thraco-Dacian, since čreda has an affricate, not a sibilant (s) as it supposed to be and it is in other satem languages; cf. Sanskrit śardha ‘flock, herd’, Avestan sarǝda ‘tribe, genus’. These assumptions remain open, and more in-depth research is needed.

Another example is Romanian noun cracă (creangă) ‘branch’ which is derived from PIE *kak– ‘branch’ with the nasal form *kank– ‘id’ (IEW, 523). In the case of the Romanian language (or Thraco-Dacian), there was an epenthesis of the lateral r probably to avoid homonymy with cac, cacă ‘to defecate’.

A particular case is Romanian noun cătană (catană) ‘soldier’ which is considered to be of Hungarian origin, but this hypothesis is not correct, since the form seems to come from an Indo-European root, namely PIE *kat– ‘to fight’, *katu-, *kat-(e)-ro– ‘fight’ (IEW, 534). The radical is encountered in several groups of Indo-European languages, being better represented in Celtic languages: cf. Galic Caturix, Old Irish cath ‘1. fight; 2. band, crowd’. In other words, if Romanian cătană is derived from this root, it has centum features, but one can not say exactly if it is a Thraco-Dacian word or it may be a Celtic loanword, especially since the form is present only in Transylvania and Banat (as well as in Pannonia/Hungary), where the Celtic influence was much higher. It is well known that the Celtic tribes of Boii and Taurisci lived for a long time in south-eastern Pannonia until they were definitively vanquished by Burebista, the great king of Dacia, in the 1st century BC. In case it is of Celtic origin, this example can not be a proof of my demonstration, but it is relevant of Celtic influence on Dacian, prior to Roman times.

Finally, Romanian noun carâmb is today considered by everyone to be of Thraco-Dacian origin. It is derived from PIE *kolemo-s , *kolema ‘stalk, reed’ (IEW, 612). Among other things, it was associated with Latin calamus which is a cognate from the same Proto-Indo-European radical (cf. DELR, 183). From the same root is derived carabă ‘flute, the pipe of a pipe-bag, tibia’ for which no other plausible etymology was found.

Other Romanian words such as a cădea ‘to fall’, corn ‘horn’, car ‘cart’, a curge ‘to flow’ are in the same situation as the example discussed above, but I do not go into details since these forms have correspondents in Latin.

Regarding the evolution of the Proto-Indo-European voiced palatal velar *g‘ the situation is practically identical to its voiceless counterpart, it turned into a regular velar (g) being preserved as such in Thraco-Dacian and  Romanian as well, as shown in the examples below.

The Romanian forms a grăi ‘to speak’, grai ‘speech, dialect’, as well as gură ‘mouth’, as well as hărmălaie ‘uproar, noise’ and gară ‘slander, noisy crowd’ are all derived from PIE * gar– ‘to call, to scream’, with the nominal forms *garo, *garā, *garmo– ‘call, lamentation’ (IEW, 352), with cognates in several of Indo-European groups of languages; cf. Sanskrit gur ‘to call’, ud-gur ‘to raise the voice’, Ossetian zarun ‘to sing’, zar ‘song’, Greek γήρυς ‘voice’, Latin garrio ‘to talk, to slander’, Old Irish gar ‘to call’, Welsh gair ‘word’ (cf. DELR, 404). Most Romanian linguists believe the verb a grăi is a loanword from Serbo-Croatian grajati ‘to croak’, which is not only ridiculous, but also an affront to Romanian language and spirituality. The hypothesis was issued by Miklosich and taken over by all linguists so far until today.

Instead, the noun gură ‘mouth’ was given a Latin origin, namely from Latin gula ‘throat’ a wrong etymology, since the two forms do not fit semantically. However, in Latin there is the verb garrio ‘to speak, to chatter’ and garrulus ‘talkative person’, much more compatible with Romanian gură ‘mouth’ (also Romanian guraliv ‘talkative person’) and almost identical to Romanian a grăi. But these forms did not attract the authors of the etymological dictionaries of the Romanian language.  The verb a grăi cannot be derived from Latin  garrio, but there is no doubt thay are cognates. According to Cihac and Cioranescu, Romanian noun gară is a loanword from … Polish. It seems to be a direct descendent of PIE *garā , while hărmalaie is a descedent of *garmo-, where g > h. (This fenomenon is found in other Romanian words such as horn ‘chimney’ > PIE gᵂhorno-s ‘chimeny’; cf. Latin fornus). There are also cognates in Greek and other Indo-European languages ​​(see DELR, 376, 404, 411).

Romanian gâscă ‘goose’ and gânsac ‘male goose’ are derived from PIE *ghans– ‘goose’ (IEW, 413). Cognates are found in most of the Indo-European languages ​​(cf. DELR). Romanian forms for ‘goose’ have been given various Slavic origins. It should be noted that Slavic forms have centum characteristics (cf. OCS. gǫsu ‘goose’) as opposed to Lithuanian žasis ‘id’ which is a satem type form. Therefore, it  seems that the Slavic form is a loanword from the Proto-Romanian (or Thraco-Dacian). Romanian form for ‘goose’ originates from an earlier form gansă < *gansa.

The verb a zgâria ‘to scratch’ are derived from PIE *gher– ‘to scratch’ (IEW, 441) with cognates only in Greek and Lithuanian. The Lithuanian žeriu, žeri ‘to scratch’ has satem characteristics. I have to mention that the Baltic languages ​​are less satem than Slavic languages, which, in my opinion, it proves that the Baltic languages ​​have had older and more intense contacts with speakers of Thraco-Dacian than the Slavs who, in their turn, have borrowed from this language and those loanwords now bear a centum mark (see Argument, DELR).

The noun gard ‘fence’, grădină (gardină) ‘garden’ and gardin ‘1. fence; 2. edge’ as well as grădiște ‘the place or remnants of an ancient fortress or city’ are derived from PIE *gher-, *gherdh– ‘fence’, with the nominal form ghorto-s ‘fenced place’ (IEW, 442).

For Miklosich, Romanian gard ‘fence’ is a loanword from OCS gradŭ ‘city’, but recently, Romanian linguists accept the idea that it is of Thraco-Dacian origin by comparison with Albanian garth ‘fence’, just because the Albanian form phonologically cannot be of Slavic language. Here Albanian behaves like a real centum language. I have also shown on other occasions that OCS gradŭ is actually a loanword form Thraco-Dacian precisely because this form has centum characteristics. As I just mentioned Romanian grădiște defines an old city and there are many place names in today’s Romania called Grădiște.

Polish linguist Z. Golab shows that Slavic and Baltic languages ​​have parallel forms deriving from the same Proto-Indo-European root, which define similar notions, but have satem features; cf. Lithuanian žardas ‘an wooden construction’, Latvian, Old Prusian sardis ‘horses pen’, OCS žŭrdŭ ‘hen coop’, Russian žerd ‘id’ as comparing with  centum type ones: cf. OCS gradŭ ‘city’, Russian gorod ‘id’, Polish grod ‘id’, OCS gorditi, graditi ‘to bulid’, etc. (see EDRL, Introduction, 28).



Selective Bibliography


Beekes, R. Etymological Dictionary of Greek (EDG), Brill, Leiden-Boston, 2010


Cihac, Al., Dictionnaire d’étymology daco-roumaine, Frankfurt, (2 vol.) 1870-1879


Cioranescu, A. , Diccionario etimologico rumano, Madrid, 1958


de Vann, M., Etymological Dictionary of Latin and other Italic languages, Brill, Leiden-Boston, 2008


Golab, Z. “Kentum” elements in Slavic, in Lingua Posnaniesis, 16, 1972, pp 53-82


Liddel, H, Scott, R. Greek-English Lexicon, Claredon Press, Oxford, 1996


Miklosich, F. Die Slawischen Elemente in Rumanischen, în “Denkenschriften”, XII, Akademie den Wissenschaften, Viena, 1862


Pausanias, Description of Greece, Loeb Classical Library, 1934.


Vinereanu, M. Dicționarul Etimologic al Limbii Române, Alcor Edimpex, București, 2008


Walde, A., Pokorny, J. Indogermanisches etymologisches Wörterbuch (IEW), Bern-München, 1959