aiming at designing and implementing a spell checking dictionary for the ...
preliminary research on the possibilities of its integration in the OpenOffice.org
A Spell Checker for Esperanto Project Report for Stage 2 (March 2009 to April 2009) by Bc. Marek Blahuš učo 172464 This report describes the development of the project “A Spell Checker for Esperanto” accomplished during its second stage, i.e. in Months 4 to 5 (March 2009 to April 2009). The state described is that valid on May 1, 2009, i.e. the date of Checkpoint 2, as specified by the project’s plan posted on its website. “A Spell Checker for Esperanto” is a project in the field of Natural Language Processing, aiming at designing and implementing a spell checking dictionary for the Esperanto language using the Hunspell spell checker, which is financed in its current phase by the Students' Research and Development Projects scholarship of the Faculty of Informatics at the Masaryk University in Brno, Czech Republic. In the previous, first stage, the spell checker’s functionality had been enhanced and preliminary research on the possibilities of its integration in the OpenOffice.org office application suite had been performed. The goals of the second stage were solving out problems related to the spell checker’s integration in the OpenOffice.org environment, and arranging its inclusion as a part of the official Esperanto language distribution of the office suite. The outcome is being presented as an analysis of the current state of development of three software packages maintained by three different bodies (the OpenOffice.org office suite, its Esperanto language localization and the discussed spell checker itself) that depend on one another and whose proper harmonization is necessary in order to produce a spell checker that may be provided as an optional and/or native tool in the office suite. A concrete reflection of the described state of the affairs is the first version of the spell checker in a form readily usable in OpenOffice.org.
Context of the Spell Checker’s Integration in OpenOffice.org The target application, the OpenOffice.org office suite, recently achieved a new milestone, when its version 3.0.0 was released on October 13, 2008. This new version implements a series of major enhancements, including a fundamental change in the way spell-checking dictionaries are treated. Since this version, dictionaries are no more installed using the DicOOo wizard, but they are being distributed as extensions of the main application, in a fashion similar to that of extensions in the Mozilla Firefox web browser. As this presents a quite recent development (OpenOffice.org 3.0.0 had not yet out when the original Bachelor’s thesis was defended), a research on the topic of dictionary extensions needed to be conducted in order to later implement the Esperanto spell checker as such a one. In order to move forward in the second goal related to OpenOffice.org, namely arranging its becoming a part of the official Esperanto language distribution of the office suite, a research on the current state of that distribution had to be performed. It has been found out, after getting in touch with the developer team responsible for OpenOffice.org’s translation into Esperanto, that because of the advent of the new 3.0.0 version that has brought along a lot of new functionality and the associated translation material, the translation works had actually started over, and were also not progressing very fast for the time being. In private discussion with one of the team’s leaders, he was made familiar with the goals of this research project and promised to try to keep things rolling so that the project’s goals may be met at the Page 1
last checkpoint, in spite of the present inconvenience caused by the lack of a stable Esperanto localization of the newest version of OpenOffice.org.
First Version Readily Usable in OpenOffice.org In the light of the aforementioned facts, not the whole goal for this stage could be actually achieved: Arranging the spell checker’s officialization in the Esperanto language version of the office suite had to be postponed into the third stage, because of the unavailability of such a version. This shall, however, not have any future negative influence on the time accurateness of the third (and last) stage, since it is still possible to test the spell checker, even though it is not yet being shipped natively as a part of the suite. The most important goal, however, has been achieved: The spell checker has been successfully turned into a dictionary extension that is properly documented and installs without errors under OpenOffice.org 3. The extension has been named “Esperanto-literumilo de [email protected]
” (to reflect the name of the organization in whose scope the project was started) and is available in form of an OXT file, a click-to-install file format used by OpenOffice.org. First attempts indicate good behavior in OpenOffice.org Write, although time performance in suggesting corrections yet sometimes feels somewhat unsatisfactory.