There are two Sounds in English International phonetic alphabet (IPA). 'I' and 'i' for English but Sinhala uses one 'e' (b) for above both two sounds.
Transliteration System for English to Sinhala Machine Translation
ICIIS07
Transliteration System for English to Sinhala Machine Translation Budditha Hettige Department of Statistics and Computer Science, Faculty of Applied Sciences, University of Sri Jayewardenepura
&
Asoka S. Karunananda Faculty of Information Technology, University of Moratuwa, Sri Lanka
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Overview
• • • • • • • • •
What is Machine Translation Problems in Machine Translation Machine Transliteration Sinhala & English Language Existing Approaches and Methods Proposed approach: Design Modules Conclusion and further works Demonstration
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
What is Machine Translation? Machine translation (MT) is a translation process that translate one natural language into other.
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
ICIIS07
Machine Translation Process
Source language dictionary
Source language Analysis
Bilingual dictionary
Translation
Target language dictionary
Target language generation
source language sentence
Target language sentence
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Source language analysis
ICIIS07
• Morphological analysis Source language Morphological analyzer analyze word by word in given sentence and returns Morphological information for each word. • Syntax analysis Source language parser identify the syntax of the given source language sentence.
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Translation
ICIIS07
Translator is used to translate source language word in to target language
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Target language generation
ICIIS07
• Morphological generation Source language Morphological analyzer/generator generate appropriate target language words with grammatical information • Syntax generation Target language parser generates the sentences in the target language
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Problems in Machine translation • Out-of-Vocabulary – No words in a dictionary
• Proper noun translation – Example (Mahinda Rajapaksha)
• Handling technical terms – Pentium IV Processor
• Multiword Expression – Oil cake ^lejqï& • Semantic and pragmatic
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
What is Machine Transliteration?
ICIIS07
Machine transliteration is a method for automatic conversion of words in one language in to phonetically equivalent ones in another language. Example the English word ‘machine’ is transliterated into Sinhala as ueIska. ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Why Machine Transliteration
ICIIS07
Machine Transliteration can be used to solve Out-of-Vocabulary problem Translate Proper nouns
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Design: English to Sinhala Machine Translation System English Sentence
English Morphological analyzer
English Dictionary
English Parser
Transliteration
Translator
Bilingual Dictionary
Intermediate Editor
Sinhala Morphological analyzer
Sinhala Dictionary
Sinhala Parser
Sinhala Sentence
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
Transliteration Approaches
ICIIS07
• Grapheme-based transliteration – direct orthographical mapping from source graphemes to target graphemes
• Phoneme-based transliteration – based on pronunciation or the source phoneme rather than spelling or source grapheme
• Hybrid and Correspondence-based transliteration – Used above two approaches
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Types of Transliterations
ICIIS07
• Forward Transliteration – Transliteration of a name from its native script to a foreign one
• Backward Transliteration – Restoration of a previously transliterated name to its native scripts
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
English and Sinhala language
ICIIS07
English Language • English contains 26 letters with 5 vowels
Sinhala Language • The Sinhala alphabet consists of 61 letters comprising 18 vowels, 41 consonants and 2 semiconsonants • Represent 40 sounds: 14 vowel sounds and 26 consonant sounds
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Phonetic Relation between English and Sinhala
ICIIS07
• These two languages are fundamentally different from each other • There are no stokes in English language • Spoken and written English are equivalent. But there is a difference between written and spoken Sinhala language • Also Diphthongs are not used in written Sinhala language
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Disambiguation
ICIIS07
•
Two English sounds ‘^’ and ‘ә’ is represented in one Sinhala letter ‘a’ (w)
•
There are two Sounds in English International phonetic alphabet (IPA) ‘I’ and ‘i’ for English but Sinhala uses one ‘e’ (b) for above both two sounds
•
No Diphthongs are used in Sinhala Language. Therefore these sound representations have some difficulties.
•
Two sounds ‘v’ and ‘w’ are represented in one Sinhalese letter ‘w’ (j)
•
No Direct Sound for English Letters q, x, z in Sinhala
•
Also large numbers of irregular word pronunciations are difficult
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Available Approaches
ICIIS07
• Dictionary writers have used numbers of methods for English to Sinhala transliteration • phonetic-based transliteration method – based on International Phonetic Alphabet (IPA) sounds
• non-phonetic-based transliteration method – Based on letters
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
Transliteration Approaches
English
Malalasekara
Rathna
Godage
Aback
D’nela
w[D]nela
tnela
Binocular
nb’fkdlahq,aD
Ìfk[d]lHq,[¾]
nhsfkdlahq,¾
Quota
laõDWÜD
lafjdag
lafjdagd
Volcono
fjd,a’flbkaDW
fj[d],,aflafkda
fjd,aflafkda
xenophobia
’fizkaD*aDWì
fi[z]k[d]f*daìh
fifkdaf*daìwd
Zero
’iazbD¾DW
[z]iSfrda
isfrda
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Proposed Approach to English to Sinhala Transliteration • • •
Letter-based transliteration approach Use Finite State Automaton (FSA) Two types of transliteration models are developed –
Type 1 :
Original English text E.g Computer
–
Type 2 :
Sinhala words written using English letters e.g. Ambepussa
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
English to Sinhala Transliteration for Original English Text (Type1)
ICIIS07
• Letter-based transliteration approach • Use Finite State Automaton (FSA)
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
IPA Chart for English Vowels
IPA English
English
Sinhala
Examples
a:
a
wd
Father
ɪ
i
b
Sit
ɪ
y
b
City
i:
ee
B
See
ɛ
e
t
Bed
ε:
ir
ta
Bird
æ
a
we
lad, cat, ran
ʌ
U, ou
w (jsjD;)
run, enough
ɒ
o, a
T
not, wasp
ɔ:
aw, au
´
law, caught
ʊ
U, oo
W
put, wood
uː
oo, ou
W!
soon, through
ə
a
w(ixjD;)
About
ə
er
w(ixjD;)
Winner
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
IPA Chart for English Consonants
IPA
English
Sinhala
Examples
P
p
ma
pen, spin, tip
B
b
í
but, web
T
t
Ü
two, sting, bet
D
d
â
do, odd
tʃ
ch, t
É
chair, nature, teach
dʒ
d,j,dge
ca
gin, joy, edge
K
c,k,q,ck
la
cat, kill, skin, queen, thick
ɡ
g
.a
go, get, beg
F
f,gh
*a
fool, enough, leaf
V
v, ve
õ
voice, have
Θ
th
;a
thing, teeth
Ð
th, the
oa
this, breathe, father
S
s, c, ss
i
see, city, pass
Z
z , se
i
zoo, rose
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
IPA Chart for English Consonants contd.. ʒ
s, ge
i
pleasure, beige
H
h
ya
ham
M
m
ï
man, ham
N
n
ka
no, tin
Ŋ
ng
x
singer, ring
L
l, ll
,
left, bell
ɹ
r
r
run, very
W
j
j
we
J
y
h
yes
ʍ
j
j
what
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
FST for Types 1 transliteration
D1
g
e
C1 i
C2
V1 r
d
V2 e
C3
e e, r
C4
v
a, e, i, o, u, y
A
B a
k
c
w, u
e
e C5
t, e, s,c ,g
h
C
D
l0
V3 t o
V4
o, u
D
C6
e D2
h
Vowels
n
C7
g C8
l
Consonants l0 = {b,c,d,f,g,h,j,k,l,m,n,p,q,r,s,t,v,w,x,y,z}
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
English to Sinhala Transliteration for Sinhala words written using English Letters (Type2)
ICIIS07
• Letter-based transliteration approach • Use Finite State Automaton (FSA)
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
Sinhala Transliteration alphabet for Type 2 Sinhala
Eng
Sinhala
Eng
Sinhala
Eng
w
a
X
nga
M
pa
wd
aa
Õ
nnga
M
pha
we
ae
p
ca
N
ba
wE
aee
P
cha
N
bha
b
i
c
ja
U
ma
B
ii
CO
jha
U
mba
W
u
[
nya
H
ya
W!
uu
{
jnya
R
ra
Ì
Ị
`P
ndja
,
la
Ï
Ị
g
tta
j
va
iD
ŗ
G
ttha
Y
sha
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
Sinhala Transliteration alphabet cont… Sinhala
Eng
Sinhala
Eng
Sinhala
Eng
iDD
ŗ
v
daa
I
ssa
T
e
V
daha
i
sa
Ta
ee
K
nna
y
ha
Ft
ai
~
nnda
gha
T!
au
o
da
|
nda
L
ka
O
dha
.
ga
L
kha
k
na
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
FST for Types 2 transliteration C7
b
i V1
C1
I
I
l
r
s
C2
l
V2
r
e
s
e
D1
t t
L2
A
B
t
C
D
L1
L1
C4
L2
V4
i
e
h C5
d
d
e
V5
n
d
V6 C6
o
h
D2
u
u
h
C3
V3
a
i
D1
V7
o, u
n, d, y d, j
n D3
d
j D4
Vowels L1 = { a, e, ,i, o, u, Ǐ, ŕ }, L2 = { a, e, i }
Consonants L1 = { k, g, c, j, t d ,b, m, y, r, f, v, s, h, l, n, p } L2 = { k, g, c, j, t, d, b, s, p}
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Approach in Practice
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
Demonstration
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
Conclusion
ICIIS07
• Handling of Pronunciations of an English word is a critical problem in English to Sinhala transliteration. – English letter ‘a’ represent different sound ‘w’, ‘we’ and ‘wE’ (ago – wf.da, America – wefursld and antwEkaá) in Sinhala
• English word contains different pronunciations – two word ‘father’ and ‘fathom’ has different pronunciation for ‘fath’
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
Further work
ICIIS07
Incorporating English IPA into the system
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation
ICIIS07
Thank you!
ICIIS-2007:Transliteration System for English to Sinhala Machine Translation