Sinhala Parser

14 downloads 0 Views 210KB Size Report
A Parser for Sinhala Language. First Step Towards ... English to Sinhala Machine Translation system is ... snoun([ID],Person, Number,Sex, Live, DIC, VB,Noun).
A Parser for Sinhala Language First Step Towards English to Sinhala Machine Translation

Budditha Hettige Department of Statistics and Computer Science, Faculty of Applied Science, University of Sri Jayewardenepura, Sri Lanka. &

Asoka S. Karunananda Faculty of Information Technology, University of Moratuwa, Sri Lanka.

Introduction      

Problem Machine Translation Design Implementation Parser in action Further work

Problem- Language barrier 





Machine translation has been a potential solution for giving access to the world knowledge available in English for those who have different mother tongues English to Sinhala Machine Translation system is not yet available Other existing Machine Translation System could not be directly used

Machine Translation Machine Translation is a translation System, that translate one language to other 

Some Machine Translation systems  



Anusaaraka, Mantra etc. for Indian Languages EDR for English to Japanese translation

Complexity of the Machine Translation  

Language Structure Sentence disambiguation

Machine Translation Source sentence Source language Morphological Analyzer

Source language parser

Bilingual dictionary

Target language Morphological generator

Target language parser

Target language sentence

Machine Translation I eat rice Source language Morphological Analyzer

noun(I) verb(eat) noun(rice)

I eat rice

– Noun, 1st person, Singular, male - verb, present tense - noun,3rd person, Singular I eat rice

Source language parser

Subject(I) verb(eat) Object(rice)

Vp

Np

Noun

Verb

Noun

(SUB)

(VEB)

(OBJ)

i

eat

rice

Machine Translation Bilingual dictionary

noun(I) verb(eat) noun(rice)

noun(uu) verb(lkjd)** noun(n;a)

Target language Morphological generator

noun(uu) verb(lkjd) noun(n;a)

noun(uu) verb(lñ) noun(n;a)

uu n;a lñ

Target language parser

wdLHdkh

Wla;h noun(uu) verb(lñ) noun(n;a)

uu n;a lï

Wla;h

l¾uh

wdLHd;h

kdu moh

kdu moh

ls%hd moh

uu

n;a



DESIGN

Design of the parsing System for Sinhala Sinhala sentence Base Dictionary

Rule Dictionary

Concept Dictionary

Morphological Analyzer

Sinhala Parser

Results

Dictionaries 

Base Dictionary The Base Dictionary contains base words (Prakurthi of the Sinhala language) and Irregular words with their Morphological instructions. Prolog predicates lex_root_word(ID, Word, N, Rule, PS). lex_root_word(ID, Word, V, Type, Time). snoun([ID],Person, Number,Sex, Live, DIC, VB,Noun). sfverb([ID],Person,Number,Sex,Live,Type, Time,Verb). spep([ID],'nipatha').

Dictionaries 

Rule Dictionary The rule dictionary stores rules required to generate various word forms Prolog predicates sinvowlet([Letters'],'soud'). sinconlet('Letter'). sin_upsraga_prefix([Letters],'Sound',Rule). noun_vib_postfix([Letters],'Sound ',Vibakthi id). gen_sin_noun(BAS,CL,DI,SP,VB,RL,SL,Out). gen_sin_fverb(Base, Type, Time,SRC,RL,Out).

Dictionaries 

Concept Dictionary The concept dictionary contains synonyms and antonyms for the words given in the base dictionary

Morphological Analyzer  

This is preprocessor for the parser Morphological analyzer reads the word from a sentence word by word. For each word, the morphological analyzer identifies grammatical information

How Morphological Analyzer works

Sinhala Parser 







The Sinhala parser receives tokenized words from the morphological analyzer Work as a Syntax analyzer for the Sinhala Sentence Successfully analyze Simple and Complex Sentences. Implemented using SWI-Prolog

Sinhala Parser Sentence → Subject Akkyanaya Subject  SimpleSubject | Complex Subject ComplexSubject  SimpleSubject ConSub SimpleSubject  Noun | Adjective Noun ConSub  Conjunction SimpleSubject Akkyanaya  VerbP | Object VerbP Object  SimpleObject | ComplexObject ComplexObject  Conjunction SimpleObject SimpleObject  Noun | Adjective Noun VerbP  Verb | Adverb Verb

Parser tree for the given sentence

Software Requirement   

SWI-Prolog 1.4 JDK1.4.0 Windows 98* / Linux

Parser in action As a Sentence checker

Further work 

Expanding the parsing system as English to Sinhala natural language translation system



Development/adaptation of English parser and construction of a bilingual dictionary

Thank you!