The Thai Script
(akson thai [อักษรไทย])
is (almost) exclusively used to write the Thai Language
(phasa thai [ภาษาไทย]),
which itself is split into several differring dialects.
Thai language is part of the Kradai Language Family (also known as Kadai, Thai–Kadai or Daic) and not related to the majority of South–East Asian languages.
It is, thouh, closely related to Lao
(phasa lao [ພາສາລາວ])
spoken in neighbouring Laos. These two languages are so similar that they form a dialect continuum and are fairly inter-intellegible.
Also, the Lao Script
(akson lao [ອັກສອນລາວ])
is so similar to Thai Script that I found it convenient to include both in this index.
Thai Script belongs to the Indic Script Family, although it shows heavy modifications from the Indian original to account for the many more vowels and also for the phonemic tones of Thai.
Like all Indisc Scripts, it consists of syllabic consonants letters that ar combined with vowel signs; the vowel signs may appear left, right, above or beow the consonant they are attached to.
A rather unique feature of Thai script (sometimes also found in Lao) is that those diacritics going to the top of a consonant arrange in two different vertical levels:
The tone marks reside at a higher position than the vowel signs, even if none of the latter are present (not all fonts show this behaviour, although they should).
Introduction
| unvoiced inaspirate | unvoiced aspirate | voiced inaspirate | voiced aspirate | nasal
|
velar | k | kʰ | g | gʰ | ṅ
|
palatal | c | cʰ | j | jʰ | ñ
|
retroflex | ṭ | ṭʰ | ḍ | ḍʰ | ṇ
|
dental | t | tʰ | d | dʰ | n
|
labial | p | pʰ | b | bʰ | m
|
|
| y | r | l | v
|
| ś | ṣ | s | h
|
Thai orthography is, like that of most South East Asian languages using Indic scripts, very complicated and full of irregularities. In principle, Thai Script is a member of the Indic Script Family; yet because of largely different language structure, it became nessesary to bend the Indic script principles to their extreme. That means that the writing system had to be massively reworked in order to become fit with Thai, abandoning most of the Indic core featured. At the same time, for cultural reasons, the script still hat to maintain some compatibility to Indic Languges, chiefly Pali and Sanskrit, to ensure easy access to Indian Buddhist writings. Thus, the script has to serve two very different languages, and this means it is asking for trouble.
Indic languages have many consonants (shown on the right side for later reference), few vowels and no tones; Thai has few consonants, a large number of vowels (some of which are complex), and five phonemic tones. While Sanskrit allows word size to grow almost indefinitely, Thai’s core vocabulary is mostly monosyllabic (polysyllabic Thai words exist, but are typically Sanskrit loans); it is essential for the reader to parse the text syllable by syllable, so onset and coda consonants must be easily distinguishable. Many consonants phones are restricted to the syllable onset (or, put differently, in final position many phonemic differences become neutralized). Clearly, it is not easy to have a script catering to both sets of requirements; but Thai does so.
Consonants
The fundamentally Indian stucture of the script is well visible among the consonants. Sanskrit has five series of obstruents (velar, palatal [usually analyzed as an affricate instead of a stop, but I’ll ignore that], retroflex, dental and labial), and all of them have survived in Thai Script. As Thai has no retroflexes, the pronounciaton of the retroflexes and the dentals are merged, and the former appear only in Indic loanwords. Thai has hardly any voiced stops, and thus the two voiced series of Sanskrit code for voiceless aspirated sounds in Thai. Therefore, Thai has three unvoced aspirated and one voiceless inaspirated series.
| Indian voiceless inaspirate (k,c,ṭ,t,p) ⬊ | Indian voiceless aspirate ⬋ (kʰ,cʰ,ṭʰ,tʰ,pʰ) | Indian voiced inaspirate ⬋ (g,j,ḍ,d,b) | Indian voiced aspirate (gʰ,jʰ, etc) | Indian nasal
|
| Thai extension, voiced inaspirate | Thai voiceless inaspirate | Thai voiceless aspirate | Thai extension, fricatives, some obsolete | Thai voiceless aspirate | Thai extension, fricatives, some obsolete | Thai voiceless aspirate | Thai nasal
|
Indian velar, Thai velar
|
|
|
|
|
|
|
|
|
Indian palatal, Thai palatal/dental
|
|
|
|
|
|
|
| Indian retroflex, Thai dental
|
|
|
|
|
|
|
|
| Indian dental, Thai dental
|
|
|
|
|
|
|
|
| Indisch labial, Thai labial
|
|
|
|
|
|
|
|
|
| Indian Sonorants and Spirants
|
|
|
|
|
|
|
|
| Thai Extensions
|
|
|
| | |
The three voiceless aspirate series are, however, not fully equivalent: The Thai system distinguishes three different consonant classes, which are indicated by colour in the above table. The low class (white) corresponds to voiced Sanskrit sounds, the high class (orange) contains those letters which in Sanskrit denote voiceless aspirates or fricatives/sibilants, and the mid class (yellow) is derived from voiceless inaspirates (the few true voiced Thai plosives also fall in this class, but they are historically glottalized and thus get classified as “unvoiced” as far as consonant classes go). These classes influence the tone associated with the following vowel, but have no meaning for the pronunciation of the consonant itself.
As a consequence, the few voiced plosives of Thai are not written with their Sanskrit equivalents, but rather with new signs derived from the voiceless inaspirates. There are some more derived letters (in satellites to columns two and three) which mostly mean fricatives (the velar ones are obsolete, and I am not sure about their sound value).
The Indic alphabet has an appendix of 8 approximants (or similar) and sibilants; these appear in exactly the same way also in Thai, but the three sibilants have phonetically merged into one (as in many Indic languages), and they are pronounced as plosives in syllable-final position. The last three letters are non-Sanskrit additions. By far the the most common of those is the glottal stop which is used for two rather different purposes: It forms the syllable onset for those syllables starting with a vowel (these really begin with a glottal stop, as in many versions of English), and inside the syllable it appears in various vowel sequences of varying length, where it usually contributes an O-like sound.
The table on the right side summarises the consonant letter system in a way that stresses the the underlying derivation from Sanskrit. Consonant classes are coded by colour, and each entry gives also the conventional Thai name in the spelling adopted by the Unicode Standard. Thai letters are named acrophonically, and thus their names have two parts: The first is phonetic (letter sound plus O), and the second simply is a selected word from the language featuring the consonant in question. The name Cho Chang thus must be understood as the letter spoken cho which appears in the word chang elephant
, distinguishing it from the homophone Cho Choe that letter cho used to write choe tree
(there are two more cho-letters). Right next to the letter is the approximate pronunciation (initial/final) which is often used in non-scientic Romanization, and below this you will find the transliteration character used in this index.
Thai lacks an established transliteration scheme. The only standard in existence is ISO 11940 which suffers from multiple deficiences and is hardly used at all; I will not even introduce its character set it here. In all real-world examples, Thai is romanized in a more or less phonetic way which loosely resemble the pronunciation hints given in the table. Thus, both cho chan and cho chang are rendered CH in the onset (ignoring the difference in aspiration), and T in the coda. This aids pronunciation, but the native spelling cannot be reconstructed from such a romanization.
I therefore resort to a home-brew transliteration scheme which is based on the Indic model and therefore rather systematic. Points of articulation (velar, palatal etc.) are indicated as in the mainstream Indic transliteration, but the articulation modes get represented in a way that actually reflects Thai usage. All letters spoken as aspirated are written with a superscript H (ʰ), and the base letter is chosen with respect to the Thai sound value, e. g., K for all voiceless velar stops. Those letters deriving from Sanskrit voiced series are distinguished by a diacritic: The former voiced inaspirates get a macron (they are common), and the former voiced aspirates get a circumflex (they are very rare); thus Indic g equals Thai k̄ʰ, and Indic gʰ is rendered as k̂ʰ. A small inconsistency is the letter So So, which appears in a Thai aspirate series but is rendered as an S with macron s̄ according to the plain S pronunciation (actually, it is a modification of Cho Chang c̄ʰ which in turn corresponds to Sanskrit j).
Consonant classes are easy to rationalize as soon as the Indic roots of the writing systems are understood. The mid class (yellow) holds all sounds which in Sanskrit are voiceless inaspirate plosives (the new Thai glottal stop also follows this rule). The high class (orange) derives from voiceless aspirates and related sounds (the three sibilants and H). The rest, which is voiced in Sanskrit, makes up the low class (white); this means all Sanskrit voiced stops (those with a top diacritic in transliteration) and various continuants (nasals, laterals) including the new Thai letters Lo Chula and Ho Nokhuk (I guess that the latter was voiced at some point in the past).
Sanskrit loanwords (or, in the case of Buddhist scriptures, true Sanskrit texts) are written etymologically, i. e., with the historically corresponding letter, not the closest phonological match. For example, the birthplace of the Buddha, Lumbinī, is naturally spelled Lump̄ʰinī in Thai, and is pronounced as such, and also with mid, high and mid tones for the three syllables, respectively (following rules that will be explained a few paragraphs later). This makes Sanskrit spoken by a Thai basically unintellegible to an Indian brahmin.
Vowels
The writing of Thai vowels is extremely involved, owing to the many phonemically different vowels in the language (and the lack of built-in vowel support in the Indic script core). As seen from the point of graphical representation, they fall into three classes: The implicit vowel need not be written because it is implied in each consonant letter. The simple vowels are written with diacritic vowel signs that get attached to the precedig consonant (the syllable onset consonants) according to the Indic model. The remaining ones are the complex vowels, typically diphthongs or triphthongs: They have no Indic counterpart and are written with sometimes lengthy sequences involving one or more vowel sign and/or one or more consonant letter. Three consonant letters can appear in vowel sequences: O Ang (also used for simple vowels in some specific cases), Wo Waen and Yo Yok.
Simple Vowels
|
a | ก implied or กะ A กัก Mai Han-Akat | ā | กา AA
|
i | กิ I | ī | กี II
|
ü | กึ UE | ǖ | กึอ UUE + O Ang กึก UUE
|
u | กุ U | ū | กู UU
|
e | เกะ E* + A เก็ก E* + Mai Taikhu | ē | เก E*
|
ä | แกะ AE* + A แก็ก AE* + Mai Taikhu | ǟ | แก AE*
|
o | โกะ O* + A กก implied | ō | โก O*
|
ɔ | เกาะ E* + AA + A ก็อก Mai Taikhu + O Ang | ɔ̄ | กอ O Ang
|
ö | เกอะ E* + O Ang + A does not occur in closed syllables | ȫ | เกอ E* + O Ang เกิก E* + I
|
True Diphthongs
|
ia | เกียะ E* + II + Yo Yak + A | īa | เกีย E* + II + Yo Yak
|
üa | เกือะ E* + UUE + O Ang + A | ǖa | เกือ E* + UUE + O Ang
|
ua | กัวะ Mai Han-Akat + Wo Waen + A | ūa | กัว Mai Han-Akat + Wo Waen กวก Wo Waen
|
Additional Signs
|
aᵐ | กำ AM |
|
ai | ใก AI Mai Muan* |
|
ái | ไก AI Mai Malai* |
|
au | เกา E* + AA |
|
r̥ | กฤ Ru | r̥̄ | กฤๅ Ru + Lakkhang Yao
|
l̥ | กฦ Lu | l̥̄ | กฦๅ Lu + Lakkhang Yao
|
Improper Diphthongs with W (occurring only in open syllables)
|
iù | กิว I + Wo Waen |
|
eù | เก็ว E* + Mai Taikhu + Wo Waen | ēù | เกว E* + Wo Waen
|
| ǟù | แกว AE* + Wo Waen
|
aù | to be replaced by au (see above) | āù | กาว AA + Wo Waen
|
| īaù | เกียว E* + II + Yo Yak + Wo Waen
|
Improper Diphthongs with Y (occurring only in open syllables)
|
aì | กัย Mai Han-Akat + Yo Yak | āì | กาย AA + Yo Yak
|
| ōì | โกย O* + Yo Yak
|
ɔì | ก็อย Mai Taiku + O Ang + Yo Yak | ɔ̄ì | กอย O Ang + Yo Yak
|
uì | กุย U + Yo Yak |
|
| ȫì | เกย E* + Yo Yak
|
| ūaì | กวย Wo Waen + Yo Yak
|
| ǖaì | เกือย E* + UUE + O Ang + Yo Yak
|
An additional complication comes from the distinction between open and closed syllables. Thai has a rather simple syllable structure C(C)V(C), with only a few allowed onset clusters (phonetically, [kkʰ]+[rlw], [ppʰ]+[rl] and t+r). The syllable boundary is not indicated directly (there is no virama), yet to allow the reader to isolate the syllables easily, many vowels have different notation in open C(C)V and closed C(C)VC syllables. This method, though indirect, is amazingly effective: I do not speak a single word of Thai, yet following the rules I was able to identify the syllables in all the spice names shown here with only one or two ambigous cases in the whole set.
The table on the right side summarizes all Thai vowel sequences; wherever necessary, one cell has two entries for open and closed syllables, respectively. Most vowels come in short/long pairs, but the their spellings are not necessarily similar.
Even the implicit vowel is complicated. In open syllables, it sounds a and in closed syllables o. In some cases, e. g. whenever the next syllable starts with a cluster, it may become necesary to explicitly write the implicit vowel; otherwise, a word like kapla were ambigous (ka-pla or kap-la). The sign Sara A is used for the implicit vowel in such cases, and is has furher use in several vowel sequences where it denotes shortness (replaced by Mai Taikhu in closed syllables).
The following signs are used for simple vowels: Sara AA (long A, ā), Sara I (short I, i), Sara II (long I, ī), Sara UE (short Ü, ü), Sara UUE (long Ü, ǖ), Sara U (short U, u), Sara UU (long U, ū), Sara E (long E, ē), Sara AE (long Ä, ǟ) and Sara O (closed long O, ō). The short variants of E, AE and O are arrived at with the shortening marks mentioned above, and two more vowels (open O, which I transliterate as ɔ, and Ö) require short sequences some of which involve the letter O Ang.
Diphthongs ending in U involve sequences with a final Wo Waen, and such ending in I have sequences that end in Yo Yak. Yet, AI and AU have special representions which clearly trace back to the original Sanskrit diphthong signs which have been inherited by Thai script (Sanskrit has diphthongs E,AI,O,AU, where the classification of E and O as diphthongs is just a peculiarity of Sanskrit grammar). AU is basically written by simultaneously applying Sara E and Sara AA (mirroring the construction of the O and AU signs from E and AA in most Indic scripts), and for AI, there are two typographically slightly different versions of the South Indian AI vowel sign. AI can also be written by a sequence with Yo Yak; thus there are three possible representations for that sound, normalized by orthographic rules.
The vowel signs Sara E, Sara AE, Sara O and the two AI-signs graphically appear at the beginning of the syllable, left of the onset consonant (in case of an initial cluster, left of the entire consonant group); in the table, they are marked with an asterisk for clarity. Sara I, Sara II, Sara UE, Sara UUE and the vowel shorteners (Mai Han-Akat, Mai Taikhu) appear on top of the consonant (in case of a cluster, the second consonant), and Sara U and Sara U appear below the consonant. The remaining vowel signs (Sara A, Sara AA and the special case Sara AM) follow the consonant.
The notorious Sanskrit letters for vocalized liquids (RU, its pendent LU and the corresponding long forms) also make an appearance. They are not fully obsolete even when writing Thai, for they appear in some Sanskrit loanwords and, rather amazingly, also in some neologisms derived from English.
The nasal mark Nikhahin (Thai incarnation of the Indian Anusvara) is no exactly a vowel, but behaves typographically similar to vowel signs. It is not used in true Thai words, except in the very frequent combination with Sara AA (open syllables only). The ligature of those two signs is so common that it is usually considered a vowel sign in its own right, Sara AM. Although derived from the long form AA, it is realized with a short a sound. There is phonetic contrast between a syllable ending in AM and one ending in A plus consonant M.
If merits are sticky, then the Unicode Standard certainly has not stained its hands when encoding the Thai Script. Coding Thai texts follows a visual model, meaning that the signs are written and stored in typograpical order, as opposed to the logical order used for nearly all other Brahmi-derived scripts. This means that in the encoded text, the left-attaching vowel signs (E etc.) appear before the consonant they are following in speech. As a consequence, there is no joining behaviour defined for these vowel signs; typographically, they are just letters (the Standard tries to push this to the extreme by also defining no joining behaviour for A and AA, where it could have done easily, but the otherwise very similar AM indeed is a spacing accent). Electronic procession of Thai texts becomes a dire nightmare dwarfing that of Elm Street, because everything is different from every other language and must be done differently. A virama model similar to that used for Khmer was considered, but had to be discarded for compatibility with a misbegotten existing Thai standard.
In transliterating the vowels, I do the obvious and go the phonetic way. Everything long gets a macro somewhere, and this poses a problem with the rounded vowels ÄÖÜ: Their long counterparts need to carry both a diaresis and a macro ǞȪǕ, which isn’t really reader-friendly (thankfully, Unicode offers precomposed letters for all of them, which improves the rendering in real-world engines). The improper diphthongs are marked with a grave accent on their last part (representing the semi-vocalic element). I try to follow Indic conventions wherever possible, and this means that the anusvara would be transliterated as ṃ; however, Sara AM should be different, and so I chose a superscript m (ᵐ). The latter character is well known for not being supported by Windows XP, but frankly, that transliteration is so fiendishly overdecorated, having often more diacritics than base letters in a word, and XP would perfom miserably even if it did not fail on ᵐ.
Tone marks etc.
Thai has five different tones: Mid 33, low 21, falling 41, high 34 and rising 25. Each syllable can be pronounced with two to five different tones, depending on the consonant/vowel distribution. Syllables differing only in tone may exhibit completely unrelated meanings, and therefore it is vital for the script to code all tones unambigously. In order to archive that goal, four different tone marks are used (Mai Ek, Mai Tho, Mai Tri and Mai Chattawa).
| Class of initial consonant
|
| high | mid | low
|
short vowel | Open syllable or ending in plosive | low | mid | high
|
Syllable ending in sonorant | rising | mid | mid
|
long vowel | Open syllable or ending in sonorant
|
Syllable ending in plosive | low | low | falling
|
any syllable with tone mark Mai Ek (1) ก่ | low | low | falling
|
any syllable with tone mark Mai Tho (2) ก้ | falling | falling | high
|
any syllable with tone mark Mai Tri (3) ก๊ | | high |
|
any syllable with tone mark Mai Chattawa (4) ก๋ | rising
|
Yet, it would not be Thai if it were easy to determine the tone for a given written syllable. Rather, the tone is a function of consonant class, vowel length and syllable coda, with optional overriding of the last two by a tone mark (in fact, less than 50% of all written Thai syllables need a tone mark). The table at the right side summarizes the rules.
There is an important additional rule: A syllable beginning with a nasal, approximant or lateral (all of which are voiced, thus belonging to the low class) can be preceded by a Ho Hip character with, although mute, lends its high class to the entire syllable. Consequently, many syllables that would be considered low (only two or three different tones possible) can gain access to more possible tones (four or five).
In transcribing the tones, I follow the Thai Script in just rendering the tone marks, which makes the lookup of the correct tone as complicated as in the native writing. Since the names of the tone marks derive from the Sanskrit numerals One to Four (think of, for example, eins, two, treis and quatuor), I just use superscript numbers. In Thai script, all tone marks are nonspacing diacritics attached to the consonants and floating higher than vowel signs (in Unicode, they follow the consonant and, if present, diacritic vowel signs, but they precede any spacing vowel signs). To improve the readability of the transliteration, I have decided to show the superscript numbers at the end of the syllable.
ก์
Another sign hovering as high as the tone marks is the cancellation mark Thanthakat (shown right with the letter K). It marks consonants or syllables that are no longer spoken but have been orthographically fossilized. It never appears in true Thai words, but appears in quite some Sanskrit loanwords in even in more recent English loads (e. g. marking the R in pepper). This sign can be applied to a single consonant or a entire syllable. In transliteration, I represent it by a superscript zero immediately after the consonant (k⁰).
The Lao Script
| Indian voiceless inaspirate (k,c,ṭ,t,p) ⬊ | Indian voiceless aspirate ⬋ (kʰ,cʰ,ṭʰ,tʰ,pʰ) | Indian voiced inaspirate ⬋ (g,j,ḍ,d,b) | Indian voiced aspirate (gʰ,jʰ, etc) | Indian nasal
|
| Lao extension, voiced inaspirate | Lao voiceless inaspirate | Lao voiceless aspirate | Lao extension, fricative | Lao voiceless aspirate | Lao extension, fricatives | Lao unused | Lao nasal
|
Indisch velar, Lao velar
|
|
|
|
|
|
|
|
|
Indian palatal, Lao palatal/dental
|
|
|
|
|
|
|
| Indian retroflex, Lao unused
|
|
|
|
|
|
|
|
| Indian dental, Lao dental
|
|
|
|
|
|
|
|
| Indian labial, Lao labial
|
|
|
|
|
|
|
|
|
| Sonorants and Spirants
|
|
|
|
|
|
|
|
| Lao Extensions
|
|
|
| | |
The script for writing the Lao language is very similar to the Thai script: The consonants used for Lao are just a subset of those used for Thai, and even the letter shapes are closely related. There is more difference with respect to vowels, but even there the subset
statement true to a first approximation. The Unicode positions of Lao are x80 shifted wih respect to their Thai counterparts.
The traditional letter naming in Lao is acrophonic, as in Thai; in about half of the cases, the same word is taken for the letter name as in Thai. The Unicode Standard, however, has mostly adopted a different naming convention that uses just the phonetic part (Ko, Tho, Fo etc.) which, if necessary, gets augmented by the adjectives Sung (middle class) or Tam (low class). Therefore, the Thai and Lao letters of corresponding characters are not related.
The consonant letter system of Lao is the same as for Thai, with the following differences:
- Lao has no retroflex series (ḍ,ṭ,ṭʰ,ṭ̄ʰ,ṭ̂ʰ,ṇ); this extends to the retroflex lateral ḷ. Also, the two low class aspirated series have merged (thus, k̂ʰ,ĉʰ,ṭ̂ʰ,t̂ʰ,p̂ʰ are absent). Of the three Indic sibilants, ś und ṣ do no longer exist. Lastly, the letter cʰ and the two obsolete letters xʰ and x̄ʰ have vanished. All together, Lao has 17 letters less than Thai.
- The two Thai letters r and l have each a counterpart in Lao, but their pronunciation is the same l/n (initial/final). Also, ñ has pronunciation ny/– (Thai: y/n).
- There is also a merger between c̄ʰ and s̄. The new letter is pronounced s/s and accordingly called So Tam, but by glyph shape seems to continue Thai c̄ʰ; the Unicode Standard decided to allocate the code position of c̄ʰ to it (which I consider a folly, and thus do not repeat in the table).
- Speaking of folly and Unicode: The standard managed to confuse two independent pairs of letters: The names for r (
Ro Root) and l (Lo Ling) have their names reversed in the Standard, and the same holds for f (Fo Tam) and f̄ (Fo Sung). Both cases show a significant degree of carelessnes: After all, there is a Thai character named Lo Ling, and it should have attracted notice that the Thai and Lao code positions don’t match. Also, the descriptors tam and sung in Lao simply refer to to the consonant classes low and mid, respectively, and the inconsistency really meets the eye.
As the Unicode standard in guaranteed never to change the name of a letter in a subsequent version, the misnamings will persist till the world is changed. As a mild correction, Unicode provides offical alias names for the misnamed letters; the stability policy defined for letter names does not extend to those, meaning they might be dropped, or some unrelated codepoint may take over the name in the future. Moreover, if case of the f desaster the names cannot be simply swapped in the alias, and thus the standard resorts to acrophonic Lao names which are used nowhere else except in the broken r/l letters. The following table explains the mess:
Transcr. | Unicode Thai (fine) | Unicode Lao (and what it should have been) | Official Alias for Lao
|
f | ฝ | FO FA | U+0E1D | ຝ | FO TAM | U+0E9D | better FO SUNG | FO FON
|
f̄ | ฟ | FO FAN | U+0E1F | ຟ | FO SUNG | U+0E9F | better FO TAM | FO FAY
|
r | ร | RO RUA | U+0E23 | ຣ | LO LING | U+0EA3 | better RO ROT | RO
|
l | ล | LO LING | U+0E25 | ລ | LO LOOT | U+0EA5 | better LO LING | LO
|
Simple Vowels
|
a | ກະ A ກັກ Mai Kan | ā | ກາ AA
|
i | ກິ I | ī | ກີ II
|
ü | ກຶ Y | ǖ | ກຶ YY
|
u | ກຸ U | ū | ກູ UU
|
e | ເກະ E* + A ເກັກ E* + Mai Kan | ē | ເກ E*
|
ä | ແກະ EI* + A ແກັກ EI* + Mai Kan | ǟ | ແກ EI*
|
o | ໂກະ O* + A ກົກ Mai Kon | ō | ໂກ O*
|
ɔ | ເກາະ E* + AA + A ກັອກ Mai Kan + O | ɔ̄ | ກໍ Niggahita ກອກ O
|
ö | ເກິ E* + I | ȫ | ເກີ E* + II
|
True Diphthongs
|
ia | ເກັຽະ E* + Mai Kan + NYO + A ກັຽກ Mai Kan + NYO | īa | ເກັຽ E* + Mai Kan + NYO ກຽກ NYO
|
üa | ເກື E* + Y + O | ǖa | ເກືອ E* + YY + O
|
ua | ກົວະ Mai Kon + Wo + A | ūa | ກົວ Mai Kon + Wo ກວກ Wo
|
Additional Signs
|
aᵐ | ກຳ AM |
|
ai | ໃກ AY* |
|
ái | ໄກ AI* |
|
au | ເກົາ E* + Mai Kon + AA |
|
Improper Diphthongs with W
|
| ǟù | ແກວ EI* + Wo
|
aù | to be replaced by au (see above) | āù | ກາວ AA + Wo
|
| īaù | ກຽວ NYO + Wo
|
Improper Diphthongs with Y
|
aì | ກັຍ Mai Kan + Nyo | āì | ກາຍ AA + Nyo
|
ɔì | ກັອຍ Mai Kan + O + Nyo ??
| ɔ̄ì | ກອຍ O + Nyo
|
| ȫì | ເກີຍ E* + II + Nyo
|
| ūaì | ກວຍ Wo + Nyo
|
- The four letters ś,ṣ,s,cʰ have merged into s, and the resulting letter is shifted into the position of cʰ (Unicode does not follow here, and leaves the letter in the s position). This results in a slight difference in the collation sequences of the two languages. The Lao names in this index are sorted in Thai style.
- The consonant class of y is mid (in Thai, it is low).
Lao has consistently abandoned the concept of an inherent vowel; rather, all letters are true consonants, and every syllable needs at least one vowel sign. Similar to Thai, vowel signs may differ for open and closed syllables. The simple vowels are the same in both languages (although pronunciation may vary), but Lao has lesser diphthongs. Unicode has sometimes different names for the vowel signs, e. g. Thai AE equals Lao EI, and Thai UE is called Y in Lao.
The functionalities of the two Thai signs Mai Han-Akat and Mai Taikhu have been taken over by two new Lao signs: Mai Kan and Mai Kon. Mai Kon (which should actually have been named Mai Kong) is used to write the vowel O in closed syllables (where where it is implicit in Thai) and is also needed for the construction of the diphthongs ua, ūa and au (replacing Thai Mai Han-Akat, though this is not 100% analog). Other vowel sequences are written with Mai Kan. Some of them are analog to those in Thai using Mai Han-Akat or Mai Taikhu (e,ä,ɔ,aì), but others (ia,īa) are not.
In Lao, ö is written as E+I (and E+II for the long version). Also, the sequences for üa and ǖa are basically the same and differ only in the use of Y and YY, respectively. In both these examples, the Lao spelling is far more intuitive than the Thai spelling.
On the other side, the spelling of long ɔ̄ in open syllables is strange: Lao here employs the historic nasalization sign Niggahita. Nasalization is not a prominent feature in South East Asian languages, but the Niggahita still has its original semantics as part of the AM vowel sign (which is composed of AA and Niggahita, as its Thai counterpart, and there is a compatibility equivalence in Unicode).
The spacing properties of Lao vowel signs are identical to that of their Thai counterparts: E, EI, O, AY and AI precede the consonant, just as their Thai equivalents Sara E, Sara AE, Sara O, Sara AI Maimuan and Sara AI Maimalai. I, II, Y and YY sit on top of the consonant (as do Mai Kan and Mai Kon), and U and UU go below the baseline. Lao has no RU and LU vowels.
While Thai employs the letter Yo Yak in many diphthong sequences, in Lao, the counterpart letter Yo is never used for diphthongs. Instead, the letter Nyo ñ (a homophone to Yo Yak in Thai) forms part of such sequences, but mostly in a typographically changed shape that thas a separate codepoint in Unicode (“Vowel sign NYO”). The unaltered shape Nyo is used for the improper diphthongs ending in I, although I have seen NYO in this position occasionally.
The preceeding table gives a list of vowels and vowel sequences in Lao. Differences to Thai (aside from the Unicode names) are highlighted, and detailled by extra mouseover information. Vowel signs that precede their consonant are marked with an asterisk. Because of the unfortunate Unicode naming convention, care must be taken not to confuse the vowel sign O (written “O*” here) with the consonant letter O (written “O” here). Also, in this table “Nyo” refers to the consonant letter and “NYO” to the vowel sign.
In transliteration, it is most straightforward to use exactly the same symbols for Lao as for Thai. This seems to pose no problem at all, but is slightly unintuitive for pronunciation in a few cases. Most glaringly, the vowel which I render as ü has a pronunciation more akin to Turkish dotless ı.
The close similarity of Thai and Lao script enabled me to do an algorithmic transliteration Lao→Thai, taking care of the different vowel orthographies (this uses a sequence of carefully crafted regular expression substitions that first identify syllable boundaries and then use pattern matching to rewrite the vowel sequences). This is an experimental feature that should not be trusted too much. The reverse transliteration, while perhaps not impossible, would be more difficult because Thai allows more initial consonant clusters and more complex diphthong sequences; moreover, it would have to be lossy because of the many Thai consonants and vowel sequences that have no counterpart in Lao.
| Class of initial consonant
|
| high | mid | low
|
short vowel | Open syllable or ending in plosive | high | high | mid
|
Syllable ending in sonorant | rising | rising ?? | high
|
long vowel | Open syllable or ending in sonorant
|
Syllable ending in plosive | low fall | low fall | high fall
|
any syllable with tone mark Mai Ek (1) ກ່ | mid | mid | mid
|
any syllable with tone mark Mai Tho (2) ກ້ | low fall | high fall | high fall
|
any syllable with tone mark Mai Ti (3) ກ໊ | | rise |
|
Lao is generally considered to have 6 tones, although the exact count and characterization depend on the dialect and the researcher involved. In their canonical order, they are rising (24), high (44), high falling (52), mid (33), low (11) and low falling (31).
Tones depend, as in Thai, on the syllable structure (class of initial consonant, vowel length and syllable coda), and there are three tone marks available to override the default (Mai Ti is rare, though). The conditions involved are exactly the same as in Thai, but the tones are not — rather, they are very dissimilar to Thai tones encountered in syllables of the same structure.
As in Thai, the letter h (Ho Sung) ຫ may be used to shift the consonant class of a sonorant from low to high. Unlike Thai, the script provides special ligatures for some of the combinations h+[ṅñnmrlw]. Unicode has dedicated codepoints for the ligatures h+m and h+n, which are compatibility decomposable into the simple consonants. Moreover, the letters l and r can appear as a subscript when combined with h; in that case, the distinction between the two characters (which are near-homophones anyway) is lost. Irrespective of glyph type, the combinations of h+[ṅñnmrlw] are sometimes considered separate letters:
h+ṅ: ຫ + ງ = ຫງ,
h+ñ: ຫ + ຍ = ຫຍ,
h+n: ຫ + ນ = ຫນ (ໜ),
h+m: ຫ + ມ = ຫມ (ໝ),
h+r: ຫ + ຣ = ຫຣ (ຫຼ),
h+l: ຫ + ລ = ຫລ (ຫຼ),
h+w: ຫ + ວ = ຫວ.
Browser support tends to be a lot worse for Lao than for Thai. On one hand, this is surprising, as the scripts are so similar that one can hardly imagine a programmer who solves one and leaves the other one open; on the other side, perhaps not all programmers have even heard of Lao,and in any case there are also economical arguments. BTW, as of 2011, Google does not offer a way to search Lao documents, because the indexer would have to split the web site text into syllables (as it does for Thai); currently, Google considers all uninterrupted strings of Lao characters as a single unit, and is unable to search for parts thereof. This is basically equivalent to finding the word disc in the text thisistheendofmydiscussionofthethaiandlaoalphabets, which Google clearly cannot do without a lot of additional analysis.