A Parser for Sinhala Language. First Step Towards ... English to Sinhala Machine Translation system is ... snoun([ID],Person, Number,Sex, Live, DIC, VB,Noun).
A Parser for Sinhala Language First Step Towards English to Sinhala Machine Translation
Budditha Hettige Department of Statistics and Computer Science, Faculty of Applied Science, University of Sri Jayewardenepura, Sri Lanka. &
Asoka S. Karunananda Faculty of Information Technology, University of Moratuwa, Sri Lanka.
Introduction
Problem Machine Translation Design Implementation Parser in action Further work
Problem- Language barrier
Machine translation has been a potential solution for giving access to the world knowledge available in English for those who have different mother tongues English to Sinhala Machine Translation system is not yet available Other existing Machine Translation System could not be directly used
Machine Translation Machine Translation is a translation System, that translate one language to other
Some Machine Translation systems
Anusaaraka, Mantra etc. for Indian Languages EDR for English to Japanese translation
Complexity of the Machine Translation
Language Structure Sentence disambiguation
Machine Translation Source sentence Source language Morphological Analyzer
Source language parser
Bilingual dictionary
Target language Morphological generator
Target language parser
Target language sentence
Machine Translation I eat rice Source language Morphological Analyzer
noun(I) verb(eat) noun(rice)
I eat rice
– Noun, 1st person, Singular, male - verb, present tense - noun,3rd person, Singular I eat rice
Source language parser
Subject(I) verb(eat) Object(rice)
Vp
Np
Noun
Verb
Noun
(SUB)
(VEB)
(OBJ)
i
eat
rice
Machine Translation Bilingual dictionary
noun(I) verb(eat) noun(rice)
noun(uu) verb(lkjd)** noun(n;a)
Target language Morphological generator
noun(uu) verb(lkjd) noun(n;a)
noun(uu) verb(lñ) noun(n;a)
uu n;a lñ
Target language parser
wdLHdkh
Wla;h noun(uu) verb(lñ) noun(n;a)
uu n;a lï
Wla;h
l¾uh
wdLHd;h
kdu moh
kdu moh
ls%hd moh
uu
n;a
lñ
DESIGN
Design of the parsing System for Sinhala Sinhala sentence Base Dictionary
Rule Dictionary
Concept Dictionary
Morphological Analyzer
Sinhala Parser
Results
Dictionaries
Base Dictionary The Base Dictionary contains base words (Prakurthi of the Sinhala language) and Irregular words with their Morphological instructions. Prolog predicates lex_root_word(ID, Word, N, Rule, PS). lex_root_word(ID, Word, V, Type, Time). snoun([ID],Person, Number,Sex, Live, DIC, VB,Noun). sfverb([ID],Person,Number,Sex,Live,Type, Time,Verb). spep([ID],'nipatha').
Dictionaries
Rule Dictionary The rule dictionary stores rules required to generate various word forms Prolog predicates sinvowlet([Letters'],'soud'). sinconlet('Letter'). sin_upsraga_prefix([Letters],'Sound',Rule). noun_vib_postfix([Letters],'Sound ',Vibakthi id). gen_sin_noun(BAS,CL,DI,SP,VB,RL,SL,Out). gen_sin_fverb(Base, Type, Time,SRC,RL,Out).
Dictionaries
Concept Dictionary The concept dictionary contains synonyms and antonyms for the words given in the base dictionary
Morphological Analyzer
This is preprocessor for the parser Morphological analyzer reads the word from a sentence word by word. For each word, the morphological analyzer identifies grammatical information
How Morphological Analyzer works
Sinhala Parser
The Sinhala parser receives tokenized words from the morphological analyzer Work as a Syntax analyzer for the Sinhala Sentence Successfully analyze Simple and Complex Sentences. Implemented using SWI-Prolog
Sinhala Parser Sentence → Subject Akkyanaya Subject SimpleSubject | Complex Subject ComplexSubject SimpleSubject ConSub SimpleSubject Noun | Adjective Noun ConSub Conjunction SimpleSubject Akkyanaya VerbP | Object VerbP Object SimpleObject | ComplexObject ComplexObject Conjunction SimpleObject SimpleObject Noun | Adjective Noun VerbP Verb | Adverb Verb
Parser tree for the given sentence
Software Requirement
SWI-Prolog 1.4 JDK1.4.0 Windows 98* / Linux
Parser in action As a Sentence checker
Further work
Expanding the parsing system as English to Sinhala natural language translation system
Development/adaptation of English parser and construction of a bilingual dictionary
Thank you!