This site works better with JavaScript enabled!

Index for Spices in Thai and Lao



The Thai Script (akson thai [อักษรไทย]) is (almost) exclusively used to write the Thai Language (phasa thai [ภาษาไทย]), which itself is split into several differring dialects. Thai language is part of the Kradai Language Family (also known as Kadai, Thai–Kadai or Daic) and not related to the majority of South–East Asian languages. It is, thouh, closely related to Lao (phasa lao [ພາສາລາວ]) spoken in neighbouring Laos. These two languages are so similar that they form a dialect continuum and are fairly inter-intellegible. Also, the Lao Script (akson lao [ອັກສອນລາວ]) is so similar to Thai Script that I found it convenient to include both in this index.

Thai Script belongs to the Indic Script Family, although it shows heavy modifications from the Indian original to account for the many more vowels and also for the phonemic tones of Thai. Like all Indisc Scripts, it consists of syllabic consonants letters that ar combined with vowel signs; the vowel signs may appear left, right, above or beow the consonant they are attached to. A rather unique feature of Thai script (sometimes also found in Lao) is that those diacritics going to the top of a consonant arrange in two different vertical levels: The tone marks reside at a higher position than the vowel signs, even if none of the latter are present (not all fonts show this behaviour, although they should).

Introduction

unvoiced
inaspirate
unvoiced
aspirate
voiced
inaspirate
voiced
aspirate
nasal
velarkg
palatalcjñ
retroflexṭʰḍʰ
dentaltdn
labialpbm

yrlv
śsh

Thai orthography is, like that of most South East Asian lan­guages using Indic scripts, very com­pli­cated and full of ir­regu­lari­ties. In prin­ciple, Thai Script is a member of the Indic Script Family; yet be­cause of large­ly dif­ferent language struc­ture, it became nes­sesary to bend the Indic script prin­ciples to their extreme. That means that the writing system had to be massively reworked in order to become fit with Thai, abandoning most of the Indic core featured. At the same time, for cultural reasons, the script still hat to maintain some compatibility to Indic Languges, chiefly Pali and Sanskrit, to ensure easy access to Indian Buddhist writings. Thus, the script has to serve two very different languages, and this means it is asking for trouble.

Indic languages have many consonants (shown on the right side for later reference), few vowels and no tones; Thai has few consonants, a large number of vowels (some of which are complex), and five phonemic tones. While Sanskrit allows word size to grow almost indefinitely, Thai’s core vocabulary is mostly monosyllabic (polysyllabic Thai words exist, but are typically Sanskrit loans); it is essential for the reader to parse the text syllable by syllable, so onset and coda consonants must be easily distinguishable. Many consonants phones are restricted to the syllable onset (or, put differently, in final position many phonemic differences become neutralized). Clearly, it is not easy to have a script catering to both sets of requirements; but Thai does so.

Consonants

The fundamentally Indian stucture of the script is well visible among the consonants. Sanskrit has five series of obstruents (velar, palatal [usually analyzed as an affricate instead of a stop, but I’ll ignore that], retroflex, dental and labial), and all of them have survived in Thai Script. As Thai has no retroflexes, the pronounciaton of the retroflexes and the dentals are merged, and the former appear only in Indic loanwords. Thai has hardly any voiced stops, and thus the two voiced series of Sanskrit code for voiceless aspirated sounds in Thai. Therefore, Thai has three unvoced aspirated and one voiceless inaspirated series.

Indian
voiceless
inaspirate
(k,c,ṭ,t,p)
Indian
voiceless
aspirate
(kʰ,cʰ,ṭʰ,tʰ,pʰ)
Indian
voiced
inaspirate
(g,j,ḍ,d,b)
Indian
voiced
aspirate
(gʰ,jʰ, etc)
Indian
nasal
Thai ex­ten­sion, voiced in­aspi­rateThai voice­less in­aspi­rateThai voice­less aspi­rateThai ex­ten­sion, frica­tives, some ob­so­leteThai voice­less aspi­rateThai ex­ten­sion, frica­tives, some ob­soleteThai voice­less aspi­rate Thai
nasal
Indian velar,
Thai velar
KO KAI
k/k
k
KHO KHAI
kh/kh
KHO KHUAT
kh/kh
KHO KHWAI
kh/kh
k̄ʰ
KHO KHON
kh/kh
x̄ʰ
KHO RA­KHANG
kh/kh
k̂ʰ
NGO NGU
ng/ng
Indian pala­tal,
Thai pala­tal/dental
CHO CHAN
ch/t
c
CHO CHING
ch/–
CHO CHANG
ch/t
c̄ʰ
SO SO
s/t
CHO CHOE
ch/–
ĉʰ
YO YING
y/n
ñ
Indian retroflex,
Thai dental
DO CHADA
d/t
TO PATAK
t/t
THO THAN
th/t
ṭʰ
THO NANG­MAN­THO
th/t
ṭ̄ʰ
THO PHU­THAO
th/t
ṭ̂ʰ
NO NEN
n/n
Indian dental,
Thai dental
DO DEK
d/t
d
TO TAO
t/t
t
THO THUNG
th/t
THO THAHAN
th/t
t̄ʰ
THO THONG
th/t
t̂ʰ
NO NU
n/n
n
Indisch labial,
Thai labial
BO BAIMAI
b/p
b
PO PLA
p/p
p
PHO PHUNG
ph/–
FO FA
f/–
f
PHO PHAN
ph/p
p̄ʰ
FO FAN
f/p
PHO SAM­PHAO
ph/p
p̂ʰ
MO MA
m/m
m

Indian Sono­rants and Spirants
YO YAK
y/y
y
RO RUA
r/n
r
LO LING
l/n
l
WO WAEN
w/w
w
SO SALA
s/t
ś
SO RUSI
s/t
SO SUA
s/t
s
HO HIP
h/–
h
Thai Ex­tensions
LO CHULA
l/n
O ANG
ʿ/–
ʿ
HO NOK­HUK
h/–
ħ
highmidlow

The three voice­less aspirate series are, however, not fully equi­valent: The Thai system dis­tingui­shes three dif­ferent con­sonant classes, which are in­di­ca­ted by colour in the above table. The low class (white) cor­responds to voiced Sanskrit sounds, the high class (orange) contains those letters which in Sanskrit denote voice­less aspi­rates or fricatives/sibilants, and the mid class (yellow) is derived from voice­less in­aspi­rates (the few true voiced Thai plosives also fall in this class, but they are hi­stori­cally glot­talized and thus get classi­fied as “unvoiced” as far as con­sonant classes go). These classes in­fluence the tone as­so­ciated with the fol­lowing vowel, but have no meaning for the pro­nun­ciation of the con­sonant itself.

As a con­sequence, the few voiced plosives of Thai are not written with their Sanskrit equi­valents, but rather with new signs derived from the voice­less inaspi­rates. There are some more de­rived letters (in satel­lites to columns two and three) which mostly mean fricatives (the velar ones are ob­solete, and I am not sure about their sound value).

The Indic alphabet has an appendix of 8 approximants (or similar) and sibilants; these appear in exactly the same way also in Thai, but the three sibilants have phonetically merged into one (as in many Indic languages), and they are pronounced as plosives in syllable-final position. The last three letters are non-Sanskrit additions. By far the the most common of those is the glottal stop which is used for two rather different purposes: It forms the syllable onset for those syllables starting with a vowel (these really begin with a glottal stop, as in many versions of English), and inside the syllable it appears in various vowel sequences of varying length, where it usually contributes an O-like sound.

The table on the right side summarises the consonant letter system in a way that stresses the the underlying derivation from Sanskrit. Consonant classes are coded by colour, and each entry gives also the conventional Thai name in the spelling adopted by the Unicode Standard. Thai letters are named acrophonically, and thus their names have two parts: The first is phonetic (letter sound plus O), and the second simply is a selected word from the language featuring the consonant in question. The name Cho Chang thus must be understood as the letter spoken cho which appears in the word chang elephant, distinguishing it from the homophone Cho Choe that letter cho used to write choe tree (there are two more cho-letters). Right next to the letter is the approximate pronunciation (initial/final) which is often used in non-scientic Romanization, and below this you will find the trans­literation character used in this index.

Thai lacks an established trans­literation scheme. The only standard in existence is ISO 11940 which suffers from multiple deficiences and is hardly used at all; I will not even introduce its character set it here. In all real-world examples, Thai is romanized in a more or less phonetic way which loosely resemble the pronunciation hints given in the table. Thus, both cho chan and cho chang are rendered CH in the onset (ignoring the difference in aspiration), and T in the coda. This aids pronunciation, but the native spelling cannot be reconstructed from such a romanization.

I therefore resort to a home-brew trans­literation scheme which is based on the Indic model and therefore rather systematic. Points of articulation (velar, palatal etc.) are indicated as in the mainstream Indic trans­literation, but the articulation modes get represented in a way that actually reflects Thai usage. All letters spoken as aspirated are written with a superscript H (ʰ), and the base letter is chosen with respect to the Thai sound value, e. g., K for all voice­less velar stops. Those letters deriving from Sanskrit voiced series are distinguished by a diacritic: The former voiced inaspirates get a macron (they are common), and the former voiced aspirates get a circumflex (they are very rare); thus Indic g equals Thai k̄ʰ, and Indic is rendered as k̂ʰ. A small inconsistency is the letter So So, which appears in a Thai aspirate series but is rendered as an S with macron according to the plain S pronunciation (actually, it is a modification of Cho Chang c̄ʰ which in turn corresponds to Sanskrit j).

Consonant classes are easy to rationalize as soon as the Indic roots of the writing systems are understood. The mid class (yellow) holds all sounds which in Sanskrit are voiceless inaspirate plosives (the new Thai glottal stop also follows this rule). The high class (orange) derives from voice­less aspirates and related sounds (the three sibilants and H). The rest, which is voiced in Sanskrit, makes up the low class (white); this means all Sanskrit voiced stops (those with a top diacritic in trans­literation) and various continuants (nasals, laterals) including the new Thai letters Lo Chula and Ho Nokhuk (I guess that the latter was voiced at some point in the past).

Sanskrit loanwords (or, in the case of Buddhist scriptures, true Sanskrit texts) are written etymologically, i. e., with the historically corresponding letter, not the closest phonological match. For example, the birthplace of the Buddha, Lumbinī, is naturally spelled Lump̄ʰinī in Thai, and is pronounced as such, and also with mid, high and mid tones for the three syllables, respectively (following rules that will be explained a few paragraphs later). This makes Sanskrit spoken by a Thai basically unintellegible to an Indian brahmin.

Vowels

The writing of Thai vowels is extremely involved, owing to the many phonemically different vowels in the language (and the lack of built-in vowel support in the Indic script core). As seen from the point of graphical representation, they fall into three classes: The implicit vowel need not be written because it is implied in each consonant letter. The simple vowels are written with diacritic vowel signs that get attached to the precedig consonant (the syllable onset consonants) according to the Indic model. The remaining ones are the complex vowels, typically diphthongs or triphthongs: They have no Indic counterpart and are written with sometimes lengthy sequences involving one or more vowel sign and/or one or more consonant letter. Three consonant letters can appear in vowel sequences: O Ang (also used for simple vowels in some specific cases), Wo Waen and Yo Yok.

Simple Vowels
a implied or กะ A
กัก Mai Han-Akat
ā กา AA
i กิ I ī กี II
ü กึ UE ǖ กึอ UUE + O Ang
กึก UUE
u กุ U ū กู UU
e เกะ E* + A
เก็ก E* + Mai Taikhu
ē เก E*
ä แกะ AE* + A
แก็ก AE* + Mai Taikhu
ǟ แก AE*
o โกะ O* + A
กก implied
ō โก O*
ɔ เกาะ E* + AA + A
ก็อก Mai Taikhu + O Ang
ɔ̄ กอ O Ang
ö เกอะ E* + O Ang + A
does not occur in closed syllables
ȫ เกอ E* + O Ang
เกิก E* + I
True Diphthongs
ia เกียะ E* + II + Yo Yak + A īa เกีย E* + II + Yo Yak
üa เกือะ E* + UUE + O Ang + A ǖa เกือ E* + UUE + O Ang
ua กัวะ Mai Han-Akat + Wo Waen + A ūa กัว Mai Han-Akat + Wo Waen
กวก Wo Waen
Additional Signs
aᵐ กำ AM
ai ใก AI Mai Muan*
ái ไก AI Mai Malai*
au เกา E* + AA
กฤ Ru r̥̄ กฤๅ Ru + Lakkhang Yao
กฦ Lu l̥̄ กฦๅ Lu + Lakkhang Yao
Improper Diphthongs with W (occurring only in open syllables)
กิว I + Wo Waen
เก็ว E* + Mai Taikhu + Wo Waen ēù เกว E* + Wo Waen
ǟù แกว AE* + Wo Waen
to be replaced by au (see above) āù กาว AA + Wo Waen
īaù เกียว E* + II + Yo Yak + Wo Waen
Improper Diphthongs with Y (occurring only in open syllables)
กัย Mai Han-Akat + Yo Yak āì กาย AA + Yo Yak
ōì โกย O* + Yo Yak
ɔì ก็อย Mai Taiku + O Ang + Yo Yak ɔ̄ì กอย O Ang + Yo Yak
กุย U + Yo Yak
ȫì เกย E* + Yo Yak
ūaì กวย Wo Waen + Yo Yak
ǖaì เกือย E* + UUE + O Ang + Yo Yak

An additional com­pli­cation comes from the dis­tinction between open and closed syl­lables. Thai has a rather simple syl­lable struc­ture C(C)V(C), with only a few al­lowed onset clusters (phoneti­cally, [kkʰ]+[rlw], [ppʰ]+[rl] and t+r). The syl­lable boun­dary is not indi­cated directly (there is no virama), yet to allow the reader to isolate the syllables easily, many vowels have dif­ferent notation in open C(C)V and closed C(C)VC syl­lables. This method, though indirect, is amazingly effective: I do not speak a single word of Thai, yet following the rules I was able to identify the syllables in all the spice names shown here with only one or two ambigous cases in the whole set.

The table on the right side summa­rizes all Thai vowel se­quences; wher­ever necessary, one cell has two entries for open and closed syl­lables, re­spec­tively. Most vowels come in short/long pairs, but the their spellings are not neces­sarily similar.

Even the implicit vowel is complicated. In open syllables, it sounds a and in closed syllables o. In some cases, e. g. whenever the next syllable starts with a cluster, it may become necesary to explicitly write the implicit vowel; otherwise, a word like kapla were am­bigous (ka-pla or kap-la). The sign Sara A is used for the implicit vowel in such cases, and is has furher use in several vowel sequences where it denotes shortness (replaced by Mai Taikhu in closed syllables).

The following signs are used for simple vowels: Sara AA (long A, ā), Sara I (short I, i), Sara II (long I, ī), Sara UE (short Ü, ü), Sara UUE (long Ü, ǖ), Sara U (short U, u), Sara UU (long U, ū), Sara E (long E, ē), Sara AE (long Ä, ǟ) and Sara O (closed long O, ō). The short variants of E, AE and O are arrived at with the shortening marks mentioned above, and two more vowels (open O, which I trans­literate as ɔ, and Ö) require short sequences some of which involve the letter O Ang.

Diphthongs ending in U involve sequences with a final Wo Waen, and such ending in I have sequences that end in Yo Yak. Yet, AI and AU have special re­presen­tions which clearly trace back to the original Sanskrit diph­thong signs which have been inherited by Thai script (Sanskrit has diph­thongs E,AI,O,AU, where the classi­fication of E and O as diph­thongs is just a pecu­liarity of Sanskrit grammar). AU is basically written by simul­taneously applying Sara E and Sara AA (mirroring the con­struc­tion of the O and AU signs from E and AA in most Indic scripts), and for AI, there are two typo­graphi­cally slightly dif­ferent versions of the South Indian AI vowel sign. AI can also be written by a sequence with Yo Yak; thus there are three possible re­presen­tations for that sound, norma­lized by ortho­graphic rules.

The vowel signs Sara E, Sara AE, Sara O and the two AI-signs graphically appear at the beginning of the syllable, left of the onset consonant (in case of an initial cluster, left of the entire consonant group); in the table, they are marked with an asterisk for clarity. Sara I, Sara II, Sara UE, Sara UUE and the vowel shorteners (Mai Han-Akat, Mai Taikhu) appear on top of the consonant (in case of a cluster, the second consonant), and Sara U and Sara U appear below the consonant. The remaining vowel signs (Sara A, Sara AA and the special case Sara AM) follow the consonant.

The notorious Sanskrit letters for vocalized liquids (RU, its pendent LU and the corresponding long forms) also make an appearance. They are not fully obsolete even when writing Thai, for they appear in some Sanskrit loanwords and, rather amazingly, also in some neologisms derived from English.

The nasal mark Nikhahin (Thai incarnation of the Indian Anusvara) is no exactly a vowel, but behaves typographically similar to vowel signs. It is not used in true Thai words, except in the very frequent combination with Sara AA (open syllables only). The ligature of those two signs is so common that it is usually considered a vowel sign in its own right, Sara AM. Although derived from the long form AA, it is realized with a short a sound. There is phonetic contrast between a syllable ending in AM and one ending in A plus consonant M.

If merits are sticky, then the Unicode Standard certainly has not stained its hands when encoding the Thai Script. Coding Thai texts follows a visual model, meaning that the signs are written and stored in typograpical order, as opposed to the logical order used for nearly all other Brahmi-derived scripts. This means that in the encoded text, the left-attaching vowel signs (E etc.) appear before the consonant they are following in speech. As a consequence, there is no joining behaviour defined for these vowel signs; typo­graphi­cally, they are just letters (the Standard tries to push this to the extreme by also defining no joining behaviour for A and AA, where it could have done easily, but the otherwise very similar AM indeed is a spacing accent). Electronic procession of Thai texts becomes a dire nightmare dwarfing that of Elm Street, because everything is different from every other language and must be done differently. A virama model similar to that used for Khmer was considered, but had to be discarded for compatibility with a misbegotten existing Thai standard.

In transliterating the vowels, I do the obvious and go the phonetic way. Everything long gets a macro somewhere, and this poses a problem with the rounded vowels ÄÖÜ: Their long counterparts need to carry both a diaresis and a macro ǞȪǕ, which isn’t really reader-friendly (thankfully, Unicode offers precomposed letters for all of them, which improves the rendering in real-world engines). The improper diphthongs are marked with a grave accent on their last part (representing the semi-vocalic element). I try to follow Indic conventions wherever possible, and this means that the anusvara would be transliterated as ṃ; however, Sara AM should be different, and so I chose a superscript m (ᵐ). The latter character is well known for not being supported by Windows XP, but frankly, that transliteration is so fiendishly overdecorated, having often more diacritics than base letters in a word, and XP would perfom miserably even if it did not fail on ᵐ.

Tone marks etc.

Thai has five different tones: Mid 33, low 21, falling 41, high 34 and rising 25. Each syllable can be pronounced with two to five different tones, depending on the consonant/vowel distribution. Syllables differing only in tone may exhibit completely unrelated meanings, and therefore it is vital for the script to code all tones unambigously. In order to archive that goal, four different tone marks are used (Mai Ek, Mai Tho, Mai Tri and Mai Chattawa).

Class of initial
consonant
highmidlow
short vowelOpen syllable or ending in plosive lowmidhigh
Syllable ending in sonorant risingmidmid
long vowelOpen syllable or ending in sonorant
Syllable ending in plosive lowlowfalling
any syllable with tone mark Mai Ek (1) ก่ lowlowfalling
any syllable with tone mark Mai Tho (2) ก้ fallingfallinghigh
any syllable with tone mark Mai Tri (3) ก๊ high
any syllable with tone mark Mai Chattawa (4) ก๋ rising

Yet, it would not be Thai if it were easy to determine the tone for a given written syllable. Rather, the tone is a function of consonant class, vowel length and syllable coda, with optional overriding of the last two by a tone mark (in fact, less than 50% of all written Thai syllables need a tone mark). The table at the right side summarizes the rules.

There is an important additional rule: A syllable beginning with a nasal, approximant or lateral (all of which are voiced, thus belonging to the low class) can be preceded by a Ho Hip character with, although mute, lends its high class to the entire syllable. Consequently, many syllables that would be considered low (only two or three different tones possible) can gain access to more possible tones (four or five).

In transcribing the tones, I follow the Thai Script in just rendering the tone marks, which makes the lookup of the correct tone as complicated as in the native writing. Since the names of the tone marks derive from the Sanskrit numerals One to Four (think of, for example, eins, two, treis and quatuor), I just use superscript numbers. In Thai script, all tone marks are nonspacing diacritics attached to the consonants and floating higher than vowel signs (in Unicode, they follow the con­sonant and, if present, dia­critic vowel signs, but they precede any spacing vowel signs). To improve the read­ability of the trans­literation, I have decided to show the superscript numbers at the end of the syllable.

ก์ Another sign hovering as high as the tone marks is the cancellation mark Thanthakat (shown right with the letter K). It marks consonants or syllables that are no longer spoken but have been orthographically fossilized. It never appears in true Thai words, but appears in quite some Sanskrit loanwords in even in more recent English loads (e. g. marking the R in pepper). This sign can be applied to a single consonant or a entire syllable. In transliteration, I represent it by a superscript zero immediately after the consonant (k⁰).

The Lao Script

Indian
voiceless
inaspirate
(k,c,ṭ,t,p)
Indian
voiceless
aspirate
(kʰ,cʰ,ṭʰ,tʰ,pʰ)
Indian
voiced
inaspirate
(g,j,ḍ,d,b)
Indian
voiced
aspirate
(gʰ,jʰ, etc)
Indian
nasal
Lao ex­ten­sion, voiced in­aspi­rateLao voice­less in­aspi­rateLao voice­less aspi­rateLao ex­ten­sion, frica­tiveLao voice­less aspi­rateLao ex­ten­sion, frica­tivesLao unused Lao
nasal
Indisch velar,
Lao velar
KO
k/k
k
KHO SUNG
kh/kh
KHO TAM
kh/kh
k̄ʰ
NGO
ng/ng
Indian pala­tal,
Lao pala­tal/dental
CO
ch/t
c
SO TAM
s/t
NYO
ny/–
ñ
Indian retroflex,
Lao unused
Indian dental,
Lao dental
DO
d/t
d
TO
t/t
t
THO SUNG
th/t
THO TAM
th/t
t̄ʰ
NO
n/n
n
Indian labial,
Lao labial
BO
b/p
b
PO PA
p/p
p
PHO SUNG
ph/–
FO TAM
f/p
f
PHO TAM
ph/p
p̄ʰ
FO SUNG
f/p
MO
m/m
m

Sono­rants and Spirants
YO
y/y
y
LO LING
r/n
r
LO LOOT
l/n
l
WO
w/w
w
SO SUNG
s/t
s
HO SUNG
h/–
h
Lao Ex­ten­sions
O
ʿ/–
ʿ
HO TAM
h/–
ħ
highmidlow

The script for writ­ing the Lao lan­guage is very similar to the Thai script: The con­sonants used for Lao are just a sub­set of those used for Thai, and even the letter shapes are closely related. There is more dif­fe­rence with respect to vowels, but even there the sub­set state­ment true to a first ap­proxi­mation. The Uni­code positions of Lao are x80 shifted wih respect to their Thai counter­parts.

The trad­itio­nal letter naming in Lao is acro­phonic, as in Thai; in about half of the cases, the same word is taken for the letter name as in Thai. The Uni­code Stan­dard, however, has mostly adopted a dif­ferent naming con­vention that uses just the phonetic part (Ko, Tho, Fo etc.) which, if neces­sary, gets aug­mented by the adjectives Sung (middle class) or Tam (low class). There­fore, the Thai and Lao letters of cor­res­ponding charac­ters are not related.

The con­sonant letter system of Lao is the same as for Thai, with the fol­lowing dif­feren­ces:

  • Lao has no retro­flex series (ḍ,ṭ,ṭʰ,ṭ̄ʰ,ṭ̂ʰ,ṇ); this ex­tends to the retro­flex lateral . Also, the two low class aspi­rated series have merged (thus, k̂ʰ,ĉʰ,ṭ̂ʰ,t̂ʰ,p̂ʰ are absent). Of the three Indic sibi­lants, ś und do no longer exist. Lastly, the letter and the two ob­solete letters and x̄ʰ have vanished. All to­gether, Lao has 17 letters less than Thai.
  • The two Thai letters r and l have each a counter­part in Lao, but their pro­nun­ciation is the same l/n (initial/final). Also, ñ has pro­nun­ciation ny/– (Thai: y/n).
  • There is also a merger between c̄ʰ and . The new letter is pro­nounced s/s and accordingly called So Tam, but by glyph shape seems to con­tinue Thai c̄ʰ; the Unicode Standard decided to allocate the code position of c̄ʰ to it (which I con­sider a folly, and thus do not repeat in the table).
  • Speaking of folly and Uni­code: The standard managed to confuse two independent pairs of letters: The names for r (Ro Root) and l (Lo Ling) have their names reversed in the Standard, and the same holds for f (Fo Tam) and (Fo Sung). Both cases show a significant degree of carelessnes: After all, there is a Thai character named Lo Ling, and it should have attracted notice that the Thai and Lao code positions don’t match. Also, the descriptors tam and sung in Lao simply refer to to the consonant classes low and mid, respectively, and the inconsistency really meets the eye.

    As the Unicode standard in guaranteed never to change the name of a letter in a subsequent version, the misnamings will persist till the world is changed. As a mild correction, Unicode provides offical alias names for the misnamed letters; the stability policy defined for letter names does not extend to those, meaning they might be dropped, or some unrelated codepoint may take over the name in the future. Moreover, if case of the f desaster the names cannot be simply swapped in the alias, and thus the standard resorts to acrophonic Lao names which are used nowhere else except in the broken r/l letters. The following table explains the mess:

    Transcr. Unicode Thai (fine) Unicode Lao (and what it should have been) Official Alias for Lao
    f FO FA U+0E1D FO TAM U+0E9D better FO SUNG FO FON
    FO FAN U+0E1F FO SUNG U+0E9F better FO TAM FO FAY
    r RO RUA U+0E23 LO LING U+0EA3 better RO ROT RO
    l LO LING U+0E25 LO LOOT U+0EA5 better LO LING LO
    Simple Vowels
    a ກະ A
    ກັກ Mai Kan
    ā ກາ AA
    i ກິ I ī ກີ II
    ü ກຶ Y ǖ ກຶ YY
    u ກຸ U ū ກູ UU
    e ເກະ E* + A
    ເກັກ E* + Mai Kan
    ē ເກ E*
    ä ແກະ EI* + A
    ແກັກ EI* + Mai Kan
    ǟ ແກ EI*
    o ໂກະ O* + A
    ກົກ Mai Kon
    ō ໂກ O*
    ɔ ເກາະ E* + AA + A
    ກັອກ Mai Kan + O 
    ɔ̄ ກໍ Niggahita
    ກອກ O 
    ö ເກິ E* + I ȫ ເກີ E* + II
    True Diphthongs
    ia ເກັຽະ E* + Mai Kan + NYO + A
    ກັຽກ Mai Kan + NYO
    īa ເກັຽ E* + Mai Kan + NYO
    ກຽກ NYO
    üa ເກື E* + Y + O ǖa ເກືອ E* + YY + O 
    ua ກົວະ Mai Kon + Wo + A ūa ກົວ Mai Kon + Wo
    ກວກ Wo
    Additional Signs
    aᵐ ກຳ AM
    ai ໃກ AY*
    ái ໄກ AI*
    au ເກົາ E* + Mai Kon + AA
    Improper Diphthongs with W
    ǟù ແກວ EI* + Wo
    to be replaced by au (see above) āù ກາວ AA + Wo
    īaù ກຽວ NYO + Wo
    Improper Diphthongs with Y
    ກັຍ Mai Kan + Nyo āì ກາຍ AA + Nyo
    ɔì ກັອຍ Mai Kan + O + Nyo ?? ɔ̄ì ກອຍ O + Nyo
    ȫì ເກີຍ E* + II + Nyo
    ūaì ກວຍ Wo + Nyo
  • The four let­ters ś,ṣ,s,cʰ have merged into s, and the re­sulting letter is shifted into the po­sition of (Uni­code does not follow here, and leaves the letter in the s position). This results in a slight dif­ference in the col­lation sequences of the two lan­guages. The Lao names in this index are sorted in Thai style.
  • The con­sonant class of y is mid (in Thai, it is low).

Lao has con­sistent­ly ab­an­doned the con­cept of an in­herent vowel; rather, all letters are true con­sonants, and every syl­lable needs at least one vowel sign. Similar to Thai, vowel signs may differ for open and closed syl­lables. The simple vowels are the same in both lan­guages (al­though pro­nun­ciation may vary), but Lao has lesser diph­thongs. Uni­code has some­times different names for the vowel signs, e. g. Thai AE equals Lao EI, and Thai UE is called Y in Lao.

The functio­nalities of the two Thai signs Mai Han-Akat and Mai Taikhu have been taken over by two new Lao signs: Mai Kan and Mai Kon. Mai Kon (which should actually have been named Mai Kong) is used to write the vowel O in closed syl­lables (where where it is implicit in Thai) and is also needed for the con­struction of the diph­thongs ua, ūa and au (re­placing Thai Mai Han-Akat, though this is not 100% analog). Other vowel sequences are written with Mai Kan. Some of them are analog to those in Thai using Mai Han-Akat or Mai Taikhu (e,ä,ɔ,aì), but others (ia,īa) are not.

In Lao, ö is written as E+I (and E+II for the long version). Also, the sequences for üa and ǖa are basically the same and differ only in the use of Y and YY, respectively. In both these examples, the Lao spelling is far more intuitive than the Thai spelling.

On the other side, the spelling of long ɔ̄ in open syllables is strange: Lao here employs the historic nasalization sign Niggahita. Nasalization is not a prominent feature in South East Asian languages, but the Niggahita still has its original semantics as part of the AM vowel sign (which is composed of AA and Niggahita, as its Thai counterpart, and there is a compatibility equivalence in Unicode).

The spacing properties of Lao vowel signs are identical to that of their Thai counterparts: E, EI, O, AY and AI precede the consonant, just as their Thai equivalents Sara E, Sara AE, Sara O, Sara AI Maimuan and Sara AI Maimalai. I, II, Y and YY sit on top of the consonant (as do Mai Kan and Mai Kon), and U and UU go below the baseline. Lao has no RU and LU vowels.

While Thai employs the letter Yo Yak in many diphthong sequences, in Lao, the counterpart letter Yo is never used for diphthongs. Instead, the letter Nyo ñ (a homophone to Yo Yak in Thai) forms part of such sequences, but mostly in a typographically changed shape that thas a separate codepoint in Unicode (“Vowel sign NYO”). The unaltered shape Nyo is used for the improper diphthongs ending in I, although I have seen NYO in this position occasionally.

The preceeding table gives a list of vowels and vowel sequences in Lao. Differences to Thai (aside from the Unicode names) are highlighted, and detailled by extra mouseover information. Vowel signs that precede their consonant are marked with an asterisk. Because of the unfortunate Unicode naming convention, care must be taken not to confuse the vowel sign O (written “O*” here) with the consonant letter O (written “O” here). Also, in this table “Nyo” refers to the consonant letter and “NYO” to the vowel sign.

In transliteration, it is most straightforward to use exactly the same symbols for Lao as for Thai. This seems to pose no problem at all, but is slightly unintuitive for pronunciation in a few cases. Most glaringly, the vowel which I render as ü has a pronunciation more akin to Turkish dotless ı.

The close similarity of Thai and Lao script enabled me to do an algorithmic transliteration Lao→Thai, taking care of the different vowel orthographies (this uses a sequence of carefully crafted regular expression substitions that first identify syllable boundaries and then use pattern matching to rewrite the vowel sequences). This is an experimental feature that should not be trusted too much. The reverse transliteration, while perhaps not impossible, would be more difficult because Thai allows more initial consonant clusters and more complex diphthong sequences; moreover, it would have to be lossy because of the many Thai consonants and vowel sequences that have no counterpart in Lao.

Class of initial
consonant
highmidlow
short vowelOpen syllable or ending in plosive highhighmid
Syllable ending in sonorant risingrising ??high
long vowelOpen syllable or ending in sonorant
Syllable ending in plosive low falllow fallhigh fall
any syllable with tone mark Mai Ek (1) ກ່ midmidmid
any syllable with tone mark Mai Tho (2) ກ້ low fallhigh fallhigh fall
any syllable with tone mark Mai Ti (3) ກ໊ rise

Lao is generally con­sidered to have 6 tones, although the exact count and cha­racteri­zation depend on the dialect and the re­searcher in­volved. In their canoni­cal order, they are rising (24), high (44), high falling (52), mid (33), low (11) and low falling (31).

Tones depend, as in Thai, on the syl­lable struc­ture (class of initial con­sonant, vowel length and syl­lable coda), and there are three tone marks available to over­ride the default (Mai Ti is rare, though). The conditions involved are exactly the same as in Thai, but the tones are not — rather, they are very dissimilar to Thai tones en­countered in syllables of the same structure.

As in Thai, the letter h (Ho Sung) ຫ may be used to shift the consonant class of a sonorant from low to high. Unlike Thai, the script provides special ligatures for some of the combinations h+[ṅñnmrlw]. Unicode has dedicated codepoints for the ligatures h+m and h+n, which are compatibility decomposable into the simple consonants. Moreover, the letters l and r can appear as a subscript when combined with h; in that case, the distinction between the two characters (which are near-homophones anyway) is lost. Irrespective of glyph type, the combinations of h+[ṅñnmrlw] are sometimes considered separate letters: h+: ຫ + ງ = ຫງ, h+ñ: ຫ + ຍ = ຫຍ, h+n: ຫ + ນ = ຫນ (ໜ), h+m: ຫ + ມ = ຫມ (ໝ), h+r: ຫ + ຣ = ຫຣ (ຫຼ), h+l: ຫ + ລ = ຫລ (ຫຼ), h+w: ຫ + ວ = ຫວ.

Browser support tends to be a lot worse for Lao than for Thai. On one hand, this is surprising, as the scripts are so similar that one can hardly imagine a programmer who solves one and leaves the other one open; on the other side, perhaps not all programmers have even heard of Lao,and in any case there are also economical arguments. BTW, as of 2011, Google does not offer a way to search Lao documents, because the indexer would have to split the web site text into syllables (as it does for Thai); currently, Google considers all uninterrupted strings of Lao characters as a single unit, and is unable to search for parts thereof. This is basically equivalent to finding the word disc in the text thisistheend­ofmydiscussion­ofthethaiand­laoalphabets, which Google clearly cannot do without a lot of additional analysis.


  



Unicode Encoded Validate using the WDG validator Validate using the VALIDOME validator