Learn English Vocabulary and Writing

5 downloads 0 Views 1MB Size Report
types of books are very helpful to learn about an exam, the for- mat, the ... questions and answers are familiar and you can identify the ... some topics (listening, speaking, and comprehension) that are ... Visit http://emustru.sf.net to download the code used in ... award (Education Category) in the free software competition.
Learn English Vocabulary and Writing Use Software to Prepare for the SAT or GRE Exams

Manu Konchady Mustru Publishing, Oakton, Virginia.

Learn English Vocabulary and Writing: Use Software to Prepare for the SAT or GRE Exams by Manu Konchady Mustru Publishing, 3112 Bradford Wood Court, Oakton, VA 22124.

All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic, or mechanical, including photocopying, recording or by any information storage and retrieval system without written permission from the author, except for the inclusion of brief quotations in a review.

Copyright © 2009 by Manu Konchady First Edition, ISBN: 978-0-557-12557-9 Printed in the United States of America The author has taken every precaution to verify the contents of the book,but assumes no responsibility for errors or omissions in the book and any damages resulting from the use of the information contained herein. All brand names and product names mentioned in this book are trademarks or service marks of their respective companies. Any omission or misuse (of any kind) of services or trademarks should not be regarded as intent to infringe on property of others. The author recognizes and respects all marks used by companies, manufacturers, and developers as a means to distinguish their products.

Contents Preface 1. Introduction 1.1. Computer Assisted Language Learning 1.2. Quizzes . . . . . . . . . . . . . . . . . 1.2.1. Should you Guess an Answer? . 1.3. Software . . . . . . . . . . . . . . . . . 1.3.1. WordNet . . . . . . . . . . . . . 1.3.2. Text Sources . . . . . . . . . . 1.3.3. Audio . . . . . . . . . . . . . . 1.3.4. Emustru . . . . . . . . . . . . .

v

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

2. Learning Vocabulary 2.1. Why Learn Words? . . . . . . . . . . . . . . . 2.2. Which Words are Important? . . . . . . . . . 2.2.1. How many Words should you Learn? . 2.2.2. Do you know a word? . . . . . . . . . . 2.2.3. Can you guess the meaning of a word? 2.2.4. Five Ways to Grow your Vocabulary . 2.3. How to Learn with Online Quizzes . . . . . . 2.3.1. Visual Thesaurus . . . . . . . . . . . . 2.3.2. Free Rice . . . . . . . . . . . . . . . . 2.3.3. Quizlet . . . . . . . . . . . . . . . . . . 2.3.4. Emustru . . . . . . . . . . . . . . . . .

. . . . . . . .

1 3 5 7 9 11 11 12 13

. . . . . . . . . . .

21 21 23 24 25 30 31 34 35 37 38 39

i

2.4. Why should you learn Spelling? . . . . . . . 2.4.1. Spelling Error Analysis . . . . . . . . 2.4.2. Emustru Spelling Quiz . . . . . . . . 2.5. Words, Meanings, and Relationships . . . . 2.6. Word Games . . . . . . . . . . . . . . . . . 2.6.1. Emustru . . . . . . . . . . . . . . . . 2.7. Web Sites to Learn Vocabulary and Spelling

. . . . . . .

. . . . . . .

42 42 45 48 50 50 54

3. Learning Sentence Construction 3.1. Building Sentences . . . . . . . . . . . . . . . . 3.1.1. Five tips to build sentences . . . . . . . 3.1.2. Punctuation . . . . . . . . . . . . . . . . 3.1.3. Are long sentences necessary? . . . . . . 3.1.4. Do the use of synonyms improve sentences? 3.1.5. Is the sentence precise? . . . . . . . . . . 3.2. Is it grammatically correct? . . . . . . . . . . . 3.2.1. How does a grammar checker work? . . . 3.2.2. E-rater Grammar Checker . . . . . . . . 3.3. Emustru Sentence Quizzes . . . . . . . . . . . . 3.3.1. Cloze Test . . . . . . . . . . . . . . . . . 3.3.2. Find the Error . . . . . . . . . . . . . . 3.3.3. Correct the Sentence . . . . . . . . . . . 3.4. Web sites to learn sentence construction . . . .

57 58 58 61 62 63 64 64 66 70 73 74 77 79 80

4. Automatic Essay Scoring 4.1. How does it Work? . . . . . . . . . . . . . . 4.1.1. Traits and Features . . . . . . . . . . 4.1.2. Creating an Essay Model for an AES 4.1.3. Using a Model to Assign a Score . . 4.2. Applying AES . . . . . . . . . . . . . . . . . 4.2.1. Is AES Valid? . . . . . . . . . . . . .

81 83 84 87 89 89 90

ii

. . . . . .

. . . . . .

4.2.2. Essay Prompt . . . . . . . . . . . . 4.2.3. Essay Length . . . . . . . . . . . . 4.3. How do you write an essay for E-rater? . . 4.3.1. Grammar . . . . . . . . . . . . . . 4.3.2. Usage . . . . . . . . . . . . . . . . 4.3.3. Mechanics . . . . . . . . . . . . . . 4.3.4. Style . . . . . . . . . . . . . . . . . 4.3.5. Organization and Development . . 4.3.6. Lexical Complexity . . . . . . . . . 4.3.7. Prompt-Specific Vocabulary Usage 4.3.8. E-rater Writing Tips . . . . . . . . 4.4. Emustru Essay Evaluator . . . . . . . . . . 4.5. Web sites to learn Essay Writing . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

91 93 94 95 97 99 103 107 110 113 115 118 123

5. Other Topics 5.1. Listening . . . . . . . . . . . . . . . . . . . . . 5.2. Speaking . . . . . . . . . . . . . . . . . . . . . 5.3. Comprehension . . . . . . . . . . . . . . . . . 5.3.1. Requirements . . . . . . . . . . . . . . 5.3.2. Tips . . . . . . . . . . . . . . . . . . . 5.4. Web sites to practice Reading Comprehension

. . . . . .

125 125 127 128 129 130 133

Appendix A. Installing Emustru

135

Appendix B. Parts of Speech

145

Appendix C. Word Lists

153

Index

155

iii

iv

Preface Most books for exams like the SAT describe sample questions, methods to answer questions, and a few practice exams. These types of books are very helpful to learn about an exam, the format, the schedule, and the level of difficulty. However, practice exams have little value after the first or second attempt. The questions and answers are familiar and you can identify the answer from memory. This book also emphasizes practice exams, however, questions are customized to your skill level. The software included with this book tracks your performance on previous exams before creating a new custom quiz. Questions are dynamically generated when you are ready to take your exam. The use of dynamic quizzes means that you cannot rely on memory to answer questions. The only time a question is repeated is if you missed a question or if the software requires you answer the same question correctly more than once. On the downside, automatically generated questions are not as precise as manually generated questions. A compiled question is carefully produced; the description of the question and the set of answers are chosen based on some pattern and verified. The software to automatically generate questions, attempts to mimic the same process. An essay writing section is part of the current SAT and GRE exams. The Educational Testing Service (ETS), the developers

v

of the SAT and GRE exams, uses machine and human graders to evaluate essays. An automated essay evaluator is included with the accompanying software. You can also learn how Erater ®, the essay evaluator from ETS, will score your essay.

Audience This book and the accompanying software is for anyone planning on taking a standardized test or simply interested in using software to learn English. If you plan on using the software, you will need some basic knowledge of a PC (either on a Windows or Linux platform). The author will provide technical support to install and run the software.

Organization The first chapter begins with a description of some of the software that you can use to learn a language. Most of the software explained in the book is open source (with the exception of Erater) and can be downloaded from the Web. The second chapter includes a collection of tools to learn spelling, vocabulary, and word relationships. Methods to improve your vocabulary and guess the meaning of unknown words are also mentioned. The third chapter mentions a few tips to build sentences and explains how an automatic grammar checker works. Three different types of sentence quizzes are described. In the first quiz, you need to find the missing word(s) from a given set of words. In the second quiz, an error may or may not be present in a sentence; you have to spot the error or leave the sentence asis. The third quiz substitutes an underlined sentence fragment

vi

with a possible correction; here you have to identify the sentence fragment that is the most appropriate and grammatically correct. The fourth chapter explains how automated essay evaluation works. The accompanying software includes an essay evaluator that you can use to evaluate your essays. Many tips to write an essay for the E-rater essay evaluator are mentioned. You can write and organize your essay such that E-rater will be more likely to assign a high score. The final chapter includes some topics (listening, speaking, and comprehension) that are not covered in detail in this book, but are part of standardized tests. Finally, the appendices include an installation guide for the accompanying software, a brief guide to punctuation, and a collection of links to lists of SAT words, misspelled words, and words ordered by a frequency index.

Conventions The following typographical conventions are used in the book. Constant Width: Indicates file names, variable names, classes, objects, command line statements, and any other code fragment. Constant Width Bold Indicates an URL or email address. Italics: Indicates proper names such as the names of persons, books, titles, or quoted sentence fragments.

vii

Support Visit http://emustru.sf.net to download the code used in this book. The sample code is written in PhP and Java. Please report bugs, errors, and questions to [email protected]. Bugs in the code will be corrected and posted in a new version of the sample code. Your feedback is valuable and will be incorporated into subsequent versions of the book. Please contact the author, if you would like more information on some topics that have not been covered or explained in sufficient detail. I have attempted to make the contents of the book comprehensible and correct. Any errors or omissions in the book are mine alone.

Acknowledgements First I would like to thank the developers of the open source tools including – Lucene (a search engine API), LingPipe (a collection of linguistic tools), WordNet (a thesaurus / dictionary), MySQL (a database), FreeTTS (a speech synthesizer), and several other tools. These open source tools have made it possible to develop the accompanying open source “Emustru” software to learn English and practice for standardized tests. The development of Emustru was partially funded by Sarai.net, India and Cetril, France. Emustru received the third place award (Education Category) in the free software competition held by the Trophées du Libre in June 2009. The list of roots, prefixes, and suffixes for words is included with the permission of Jessica DeForest. The list of common misspelled words includes the list from Wikipedia.

viii

1. Introduction Current language exams evaluate not only vocabulary, grammar, and writing skills, but also listening and speaking abilities. Exams like the SAT reasoning test and the Graduate Record Exam (GRE) do appear challenging at first. They require a fairly large vocabulary, knowledge of some grammar rules, and decent writing skills. Memorizing word lists and a list of grammar rules is tedious. Can a computer help you prepare for these types of exams? Yes. There are many programs on the Web to learn word lists, grammar, evaluate writing, and convert text to speech. English is a moderately difficult language to learn for several reasons. One, estimates of the number of words in the English language is large and continues to grow. The Oxford English Dictionary contains about 170,000 words while the computer-based WordNet dictionary / thesaurus contains roughly 150,000 words. The total number of English words exceeds one million, if all the forms of a word are included. The same meaning can be expressed in many ways making it harder for a student to understand the language. However, few exams test for more than 10,000 words. Secondly, spelling and pronunciation can vary depending on the region. For instance, the American spelling of the standard measurement of length is meter while the British spelling of the same word is metre. Similarly, the British spelling of a legal

1

1. Introduction permit is licence and the American spelling is license. There are many other examples of spelling differences between the English spoken in other parts of the world including Australia, India, and parts of Africa. The WordNet dictionary used in this book and the accompanying software includes both the British and American spellings of words. The spellings of words is also not consistent with the phonemes (units of word sounds). Sometimes the letter i is seen before e (orient) and sometimes the letter e is before i (receive). Further, the spelling of words cannot always be deciphered from the pronunciation. For example, the words cash and cache have the same pronunciation. Similarly, the words pray and prey have identical sounds; one way to detect the correct word is based on the context. The rules for modifying words are also not consistent. The plural of goose is geese, but the plural of loose is not leese. Similarly, the plural for mouse is mice, but the plural of house is houses. The past tense of the word ask is asked while for the word speak, it is spoke. Similarly, the past tense of fly is flew and teach is taught. These inconsistencies and a host of other grammatical issues make English a fairly difficult language to learn. However, it is still important to learn the rules of grammar and spelling in addition to the list of exceptions. There are a number of good books [14, 15] to learn grammar and vocabulary. This book does not repeat the same material; software can help you prepare for exams with dynamic quizzes that are customized to your skills. The use of quizzes is a highly effective way to learn and simultaneously test a student. A quiz also requires a student to be more focused than in a lecture, through constant interaction and feedback. However, quizzes do have limitations. Lengthy and detailed explanations are not

2

1.1. Computer Assisted Language Learning suited for a quiz-like format. The question-answer style is ideal to evaluate knowledge of spellings, meanings, and word usage. The purpose of this book is to explain how a computer can help you prepare for standardized English exams with the help of quizzes and other tools. Most of the software described in the book does not need an Internet connection. The idea of using a computer to make learning more interesting is not new. Computer based learning on a PC has been in existence since the 1980s. Current PCs have more than adequate memory and power to store large dictionaries, retrieve information, and run complex programs. Further, computers are more affordable than 30 years ago. The use of multimedia and games have made the learning process more pleasant and engaging for the student.

1.1. Computer Assisted Language Learning Computer Assisted Language Learning (CALL) is a personalized approach to learn a language. The two primary features of CALL [1] are student-based or individualized learning and interactive learning. In a student-based lesson, the material shown to each student is adjusted based on prior performance. The questions and material shown to each student is customized, depending on the strengths and weaknesses of the student detected in prior learning sessions. A teacher of a class of 30-40 students would find it too onerous to build a separate quiz for each student, while computer software can quite easily create personalized quizzes and track the performance of all students in a large class.

3

1. Introduction The second interactive learning feature has become more dominant with the emergence of multimedia and other methods of engaging a student in a lesson. The use of audio, video, images, and text in a lesson has made the learning experience more interesting than browsing a book. This book describes some multimedia CALL software, but is primarily based on text quizzes, analyses of text, and word games. A student interacts with the computer via a keyboard, a microphone, or a mouse. The computer generates feedback and questions that may be presented on a screen or through audio. Some competitive exams such as the International English Language Testing System (IELTS), Test of English as a Foreign Language (TOEFL), and the Pearson Test of English (PTE) contain test questions that evaluate speaking and listening skills. You can learn pronunciation by listening to an audio transcript generated from a text file. Text to speech software “reads” a text file and highlights words synchronized with an audio representation of the word (see Chapter 5). You can also adjust the speed of the audio transcript to read the file at a slower or faster rate. The gender of the voice and accents may also be customized. The earliest attempts in the 60s and 70s to use a computer to learn languages, were based on the ideas of automating the presentation of material from a textbook and of using quizzes to test a student. The computer would analyze the student’s errors following an interactive quiz and suggest feedback. In addition, a student could learn at an individualized pace and the computer would never get impatient with the student. Critics of the drill or quiz method of learning argue that the emphasis on repetition and accuracy does not help a student learn creative ways of expressing meaning.

4

1.2. Quizzes Despite the criticism of the ineffectiveness of drills to teach communication, it has been used with great success to teach vocabulary. The best seller “Word Power Made Easy” by Norman Lewis uses a large number of quizzes to keep the reader engaged. The interactive nature of a quiz and the desire to get as many correct answers as possible allows a student to learn with less boredom than through the memorization of words or rules. A quiz has some limitations. You cannot learn grammar, sentence formation, parts of speech, discourse theory, and writing skills from a set of quizzes alone. However, a quiz can test your knowledge and is very useful before you attempt a competitive exam. You can identify your weaknesses and strengths based on your performance on a set of quizzes. A wrong answer in a quiz is usually accompanied with an explanation of the correct answer. This type of feedback is essential to learn and correct mistakes (see Figure 1.1). Later in the 80s, CALL software used fewer quizzes and more tools to use and analyze language interactively. An emphasis was placed on communicating effectively in a language. Communication included using language that was grammatically correct, appropriate for the context, and included persuasive arguments. In Chapter 4, we will examine automated methods to evaluate essays.

1.2. Quizzes Computer-based quizzes are common on the Web. A search for “vocabulary quizzes” on the Web returns over 50,000 hits on the Google search engine. Similarly, a search - “grammar quizzes” returned about the same number of hits. Some of the

5

1. Introduction

Figure 1.1.: Feedback in a Quiz Quiz

Answer

What is ... A B C

Yes Correct

No Feedback Explanation

sites on the Web specialize in quizzes for specific exams such as the SAT. The quizzes on most sites are static, i.e. the same questions and answers are shown again, if you visit the site a second time. A few sites like http://www.freerice.com adjust the difficulty of the questions based on the responses. So, the level of difficulty of questions is incrementally increased for a student who correctly answers simple questions. The Graduate Record Exam (GRE) and Graduate Management Admission Test (GMAT) exams have used such computer adaptability testing to tailor questions based on the student’s answers. A student who has correctly answered a sufficient number of questions at a lower level of difficulty is assumed to have shown

6

1.2. Quizzes competency at the skill level associated with the set of questions. A student is iteratively tested at higher levels of difficulty till some termination criterion is met. The primary benefit of such an exam is the time saved in testing a student. A reasonably accurate score can be computed from the results of a computer-adaptive test, that is as precise as an equivalent score from a longer set of questions in a static test. The quizzes in this book are not computeradaptive. Instead, dynamic quizzes are generated based on the performance of a student in past quizzes. The focus is on building vocabulary, grammar, and writing skills instead of evaluating your knowledge.

1.2.1. Should you Guess an Answer? The common wisdom is that it pays to guess an answer, if you can eliminate at least one or more of the answer choices. The analysis below applies to multiple choice questions alone, that include a penalty for a wrong answer. Consider 100 questions with five possible answers. Assume that you cannot identify the correct answer in all 100 questions. Figure 1.2 shows the score you can expect, given a penalty for every wrong answer. The penalty on x axis varies from 0.25 to 1.0. In other words, a wrong answer could reduce your score by a quarter point to a whole point (the current penalty for a wrong answer on the SAT exam is a quarter point). The y axis shows the score you can expect given the penalty and number of answer choices. Each line represents the score based on the number of answers remaining after eliminating as many wrong choices as possible. If you can only eliminate one answer out of five, then the line with 4 choices shows your expected score. When the penalty is

7

1. Introduction

Figure 1.2.: Expected Score Based on Penalty and Number of Possible Answers 50 40 30 2

Score

20 10 0

3

-10 4

-20 -30 -40 -50 0

0.1

0.2

0.3

0.4

0.5 0.6 Penalty

0.7

0.8

0.9

1

1.1

0.25, your score will be roughly 6 out of 100. So, for a quarter point penalty, there is no harm in guessing from four possible choices. However, as the penalty rises to 1.0, the expected score rapidly falls and it is not worthwhile guessing if the penalty is higher than a third of a point, since the expected score becomes negative. Of course, the more answers that you can eliminate, the higher your expected score. When you have just two choices with a quarter point penalty, you can expect a score of roughly 37 out of 100. So, it does pay to guess and your reward grows

8

1.3. Software based on the number of answers that you can positively eliminate.

1.3. Software All the software described in this book works on a PC running the Windows or Linux operating system. This book describes open source software that can be downloaded, evaluated, and customized without subscription fees or license requirements. You can learn vocabulary, check grammar, evaluate writing, and correct spelling with the collection of software packages (see Appendix A) included with this book. If you are interested, you can tinker with the software, improve it, make suggestions, add documentation, and test the code for bugs. There are many sites [2] on the Web to learn English vocabulary, grammar, writing, and reading. At the end of each chapter, a list of relevant Web sites that include practice tests are mentioned. The main skills a student of any new language would need to prepare for an exam include Listening: A student listens to an audio passage and answers questions to evaluate comprehension. Writing: An essay prompt is provided and a student writes an essay of several hundred words in response. Vocabulary: A large vocabulary is very useful in the speaking, writing, and comprehension sections. Grammar: A student must understand the syntax of a language before writing sentences that are grammatically correct.

9

1. Introduction Reading: A student’s grasp of the contents of a given text / audio passage is evaluated with a series of questions. The evaluation of listening tests the recognition of accents, pronunciation, vocabulary, and comprehension. Reading is similarly evaluated, except that a student must know the alphabet and spellings of words. Most students learn a vocabulary of several thousand words in a language before acquiring a level of knowledge sufficient to pass competitive exams. Learning vocabulary is a fairly routine task and a computer is well-suited to make this task interactive and more attractive to a student. Two popular sites on the Web to learn vocabulary are Quizlet and FreeRice (see Section 2.7 ) Learning the grammar of a language is more challenging than learning new words. English has many rules and exceptions that can only be learnt through practice. A grammar checker examines the text of a document, one sentence at a time and returns errors and suggestions for corrections. Most popular word processors include a grammar checker that identifies syntax errors and generates potential corrections. Unfortunately, some grammar checkers miss sentences that should be marked as incorrect. This is usually an intentional feature to ensure that any error that is flagged is very likely to be an actual error. There are fewer grammar checker sites on the Web than sites for building vocabulary and other types of word games. There are even fewer sites on the Web to evaluate writing skills. Some sites return a manual review of a text passage. This is of course more expensive than an automated evaluator and is more likely to be accurate than an automated evaluation. Many of the current automated text evaluators are proprietary or subscription-based.

10

1.3. Software

1.3.1. WordNet Wordnet[3] is a popular open source dictionary / thesaurus for English from the Cognitive Science Laboratory of Princeton University. The typical dictionary orders words in an alphabetic order. In WordNet, words are assigned to synonym sets (synsets) and relationships are defined between synsets. For example, the word package is assigned to a synset that contains the words - bundle, packet, and parcel. These words have the same meaning as the word package and belong to a common synset. A word can also belong to more than one synset. The word package is also used as a verb in the synset that contains the word box. Synsets are related to each other in a hierarchical like relationship. The synset with the words - collection and aggregation is a more general meaning of package, while the words sheaf and bale are more specific words. Relationships are also defined between individual words. For example, the word wild is the opposite of the word tame. Chapter 2 includes a more detailed description of WordNet and its use in Emustru.

1.3.2. Text Sources There are many sources on the Web to collect high quality English text. One of the popular sources for English books is the Project Gutenberg (PG) [6]. Many thousands of ebooks produced by a large number of volunteers have been made available online. PG includes the full text of about 20,000 public domain books including classics like the “The Adventures of Huckleberry Finn” by Mark Twain. The contents of these books have been proofread by volunteers.

11

1. Introduction Other sources include newspaper articles that are well written and proofread. These articles contain current terminology and in most cases are available without fee. The authors of editorial articles use persuasive writing to convince a reader that a point of view on a particular issue is correct or recommended. Popular authors like Paul Krugman also use blogs and articles that can be viewed online, to explain current issues. Regular readers of articles from the New York Times or the Washington Post newspapers can read articles with different styles and learn word usage from well-written and proofread text. The software explained in Chapters 2 and 3 use example sentences from some of these text sources.

1.3.3. Audio PG also includes a collection of audio books. Some of the books are generated by a professional reader and other books are converted to speech by a text-to-speech converter. The audio books spoken by a human will sound more realistic than a similar automatically-generated book. Which book is better is a personal choice. The main advantage of audio books is that you can hear and read the same text simultaneously. This means that you can recognize words, accents, and pronunciation that may appear in a listening passage in an exam. In some text to speech software, you can control the speed of the audio output, the pitch, the type of voice (male/ female), and other parameters. Speech to text software converts spoken text into written text. Often, this software must be trained to recognize individual pronunciation and the accuracy of the output depends on many factors including the sensitivity of the microphone

12

1.3. Software to background noise, adaptability to accents, and the type of training model. Chapter 5 includes a section on text-to-speech and speech-to-text software.

1.3.4. Emustru The public domain Emustru software was written to accompany this book to help you prepare for your exam. It can be downloaded from http://emustru.sf.net. The software runs on the Windows and Linux platforms and the installation details are included in Appendix A. A demo version of the software is available at the same site. Spelling Emustru includes features to learn some of the skills mentioned earlier. The spelling quiz selects words from a given word list, that has been optionally ordered by rank, and generates an audio file to “say” the word. The open source speech synthesizer FreeTTS (http://freetts.sf.net) was used to generate audio files. Vocabulary A word is selected from a pre-defined or user-provided word list; The meaning is extracted from the WordNet [3] dictionary. Some words have more than one meaning and just two of the most popular meanings are selected for a quiz. Several words that are unrelated to the given word are added to the list of answers. A student selects the meaning of a given word from a list of five options.

13

1. Introduction Emustru includes several word games to guess a word in six tries, unscramble a jumbled word, and complete a partial word.. Two lesser known games are finding the most likely word before or after a given word. For example, the word strong is more likely to be seen before the word tea than the word powerful, even though both words have the same meaning. In another game, the student must identify the type of relationship (synonym, antonym, or hypernym) between two or more given words. Sentence Analysis The Cloze (http://en.wikipedia.org/wiki/Cloze_test) test is a test where some words of a sentence are removed and the student must identify the missing words from a set of given words. This test evaluates vocabulary and knowledge of words in context. For example, the following sentence has two missing words and a set of five choices. For his eighth grade project, Ebright tried to find the cause of a _____ disease that kills ______ all monarch caterpillars every few years. • neighborhoods, crudeness • viral, nearly • dilemmas, container • tongued, unfolding • deceleration, maneuvered

14

1.3. Software The words that are missing in the sentence are selected from a pre-defined or custom word list. A student learns the context in which words chosen from the word list are used in sentences. This test complements the earlier vocabulary test where a student learnt the meaning of words. Grammar Most word processors include a grammar checker along with a spell checker to help the writer create a document that has correct syntax and spelling. In general, a grammar checker limits the number of false positives, i.e. the number of flagged errors that are not valid. A writer is more likely to be annoyed by a grammar checker that identifies errors in correct sentences and may be willing to tolerate error sentences that are not detected. Emustru uses a statistical rule-based grammar checker to find errors. A large number of rules are constructed after observing part of speech (POS) tag and word patterns in a corpus that is known to contain sentences with valid syntax. These patterns are encoded in rules and stored in database tables. The grammar checker compares patterns extracted from a test sentence with patterns saved in tables. Any pattern that is rare or unusual is flagged as a potential error. The grammar checker in Emustru is included in the essay writing evaluation function (see Chapter 4). Essay Writing The essay evaluation function in Emustru assigns a score based on a number of extracted features from a short essay of about 300-400 words. Many of the current competitive exams such as

15

1. Introduction the SAT and IELTS include an essay writing question to test an examinee’s vocabulary, grammar, and writing skill. Although, it is debatable whether writing an essay in a short period of half an hour or less can actually test an examinee’s creativity and writing skills, the essay writing question has become popular. Essay writing is usually the only free-form question in competitive exams, that allows unstructured text answers. Most of the other types of questions are multiple choice questions that can be machine graded. A human grader evaluates an essay and assigns a score, say from 1 (Poor) to 6 (Excellent) based on an overall impression of the essay. The human grader looks for grammatical mistakes, spelling errors, language usage, and several other features to compute an overall score. Two or more human graders may score the same essay to resolve errors that may arise during the grading process. When the score from two graders for the same essay differs by more than one point, a third grader scores the essay. The Educational Testing Service (ETS) has replaced one of the two human graders with an automated essay evaluator, E-rater[4]. In more than 90% of the graded essays, the absolute difference between a human grader’s and the E-rater score were within one point. The essay evaluator used in Emustru extracts features such as the number of spelling errors, number of unique words, number of grammatical mistakes and several other features to generate an overall score. The list of over 20 features is described in Chapter 4. Emustru Quizzes Emustru uses some of the philosophies behind the CALL approach to learn a language. A student learns vocabulary through

16

1.3. Software dynamic quizzes that are generated based on prior performance. Sets of correct and incorrect responses per student are maintained in database tables. Emustru generates a custom quiz using some of the questions that were missed earlier, a set of new questions, and a set of questions that were answered correctly (see Figure 1.3). Figure 1.3.: A Dynamic Quiz Generated from a Database Table Student's Table Incorrect Responses

New Quiz Missed Questions

Correct Responses

Unseen Questions

Correctly Answered

25% 25%

New Questions 50%

Often a student loses interest in a static quiz after the first attempt. The same questions with the same answers are repeated and it becomes relatively easy to recollect the correct answers. Dynamic quizzes have several advantages over static quizzes • A student sees a different set of questions in each session. • Questions that have been missed are repeated till a student has correctly answered such questions more than once.

17

1. Introduction • Questions that were correctly answered can be repeated as many times as necessary to verify that a student did not answer a question correctly by chance. • A new set of questions can be chosen by rank in every dynamic quiz. The interactive question-answer format is a simple and attractive way to keep a student’s attention during a session. Feedback is immediate and a student can verify answers through an button (see Figure 1.4). The sample vocabulary question in Figure 1.4 includes a test word, five possible answers, and five buttons. Figure 1.4.: A Sample Vocabulary Question

One of the five options for the test word ensues, is correct. Emustru picks the test word by rank or at random from a given word list. The button will evaluate the current question and return the next question. The evaluation will indicate if the given answer was correct or not. The button shows the previous question and answers. A question

18

1.3. Software

that was answered earlier cannot be modified. The button is used to display the answer for the current question. Once this button has been pressed, the question is assumed to be answered and the student must continue to the next or button is only active following previous question. The an evaluation of the question. This button returns a dictionary entry or a group of sentences that use the test word to describe how the word is used in context and its meanings. Why use Emustru? A manually generated quiz will typically be superior to a similar automatically generated quiz. The questions and answers in a manual quiz are carefully selected and verified. A dynamic quiz attempts to reproduce this process using an algorithm. Test writers are known to create questions and answers in fairly standard patterns. Wrong answers are generated in a somewhat predictable manner. Emustru uses a few simple heuristics to automatically generate a quiz based on some observations from a manual quiz. Some of the advantages of Emustru and dynamic quizzes are listed below. • Dynamic quizzes are well-suited to prepare for competitive exams such as the SAT. A list of words to prepare for these exams is fairly long; you can use dynamic quizzes to learn small sets of word lists at a time to gradually learn a large collection of words. Further, a quiz can be tweaked to repeat certain questions that you find difficult. • Some of the sites that have developed CALL software are subscription-based or proprietary. Emustru is open

19

1. Introduction source software and the data sources can be customized to suit individual requirements. • The Web interface of Emustru is intuitive and can be used without an Internet connection • The Emustru essay evaluator is one of the few open source alternatives to commercial software like Criterion[10], Intelligent Essay Assessor[13], and Intellimetric [12]. • The statistical grammar checker included with Emustru can be customized to find fewer or more errors in text.

20

2. Learning Vocabulary Learning words from a long list is a dull and boring task. Many ways have been suggested to make this task more interesting and one of the most popular ways is through multiple short quizzes of 10-20 questions each. The popular book “Word Power Made Easy” by Norman Lewis contains many such quizzes. Some of the quizzes contain the familiar multiple choice questions where a student must select the correct meaning of a word. Other quizzes provide the meaning of the word and the starting letters of the related word. The student fills in the remaining letters of the word that represents the same meaning. A true or false quiz asks a question and the student must verify if the highlighted word in the question is appropriate or not. Finally, another type of quiz matches a set of words with a set of meanings that have been jumbled. A student matches a word with the correct meaning.

2.1. Why Learn Words? Before you begin learning words, you may be wondering if it is simply a waste of time to study words, that you believe you will not need or use in your daily activities. You may assume that these words are included in exams to make it difficult for students to receive high scores. Even if you are studying words to score well in a competitive exam, there are other benefits

21

2. Learning Vocabulary you will gain with a large vocabulary. A larger vocabulary will improve your skills not just in scoring higher in an exam but it can help you • Explain your thoughts more clearly • Write better articles, reports, and messages • Speak more precisely and persuasively • Understand more of what you read • Get better grades in high school, college and graduate school There is sufficient evidence to back the theory that some of the most successful people are also the ones with the largest vocabularies. Johnson O’Connor at the Stevens Institute of Technology conducted a study to estimate how well employees were matched with their positions in a large company. Among his findings, was the discovery that a person’s vocabulary level was the single best aptitude to predict occupational success. Although this study was conducted in the 1920s, it is still valid today. Several commercial organizations sell lists of power words and explain the context in which such words can be used. Johnson O’Connor made another important discovery. The possession of a large vocabulary was not innate and could be acquired by anyone willing to make the effort to learn new words. So, even if you are learning words for a competitive exam, you will find that a larger vocabulary will lead to success in other areas as well. Learning vocabulary is also not a very difficult task that needs a high IQ. Anyone, can learn a few words at a time and

22

2.2. Which Words are Important? quickly build a large vocabulary. Although, it is relatively easy to read and study new words, few actually memorize words and meanings. A word becomes more relevant when it is used in context within a sentence.

2.2. Which Words are Important? For practical reasons, no exam tests for every word in the English language. Test writers pick 1% or fewer words from a large set of words. The size of the word set may vary from 1,000 to 10,000 words; this excludes the set of roughly 5000 high frequency words that most students learn in high school. It is difficult to simplify learning by picking a small set of words that are most likely to be selected by a test writer. So, you will need to learn as many new words as possible before taking an exam. Fortunately, this does not mean learning words at random from a dictionary. For example, the vocabulary words in the SAT exam are chosen from a pool of roughly 8000 words, that may appear in sentences and comprehension passages. There is no way to predict which words test writers will select, but they tend to favor words that have more than one meaning. For example, the word bat can mean mean a racket or club for hitting a ball in various games or the flying mammal. Similarly, a flight could mean a journey in an aircraft or it could mean the act of fleeing or even describe a set of stairs between two floors. In general, a vocabulary question is more likely to test for the meaning of a content word such as a noun or adjective than a function word. Adjective and adverbs are a little harder to grasp than nouns, since the meanings may be more abstract.

23

2. Learning Vocabulary Function words such as prepositions, conjunctions, and articles are high frequency words that a student is expected to learn early in a language course.

2.2.1. How many Words should you Learn? Till now we have not defined what a word means. A simple definition would be - “A word is the smallest unit of text with some meaning”. Words are the building blocks of sentences and it would not be possible to compose any meaningful text without the knowledge of words and the associated meanings. However, words have root and inflected forms. For example, two inflected forms of the root word jump are jumped and jumping. So, it is possible to know the meaning of more than one word, if you know the meaning of the root word. This means that the total number of words you know including all inflections will be double or more than the number of the root forms alone. Consider, a million word corpus [5] and the associated number of unique words you would encounter, as you read the entire corpus. Figure 2.1 is a plot of the total number of words vs. the number of unique words. Initially, in the first 200 thousand words, there is a steep rise in the total number of unique words followed by a gradual reduction after about 400 thousand words. The total number of unique words you would need to know to recognize a million words is a rather high 45K. However, roughly 6500 words are actually word combinations such as long-term, part-time, and anglo-saxon. The meanings of some of these words can be deciphered from the constituent words. But, a much larger number of words share a common word form or root. If we collapse all words to a base word form, then the number of

24

2.2. Which Words are Important?

Figure 2.1.: Number of Unique Words in a Million Word Corpus 50K All Words

No. of Unique Words

40K

30K

20K Base Words 10K

0

200K

400K 600K 800K Total No. of Words

1000K

unique words in a million words is less than 10,000. This means that we can extract the meaning of all 1 million words, if we know just 1% of the total number of words.

2.2.2. Do you know a word? Initially, we learn the vocabulary of a language by memorizing word lists and associated meanings or equivalent translations in another known language. This initial vocabulary collection grows when we come across unknown words in context. For example, if you did not know the meaning of the word intelligent in the sentence -

25

2. Learning Vocabulary Opponents generally argued that the ballot couldn’t give enough information about tax proposals for the voters to make an intelligent choice. you could guess that the word means sensible or logical, since the early part of sentence mentions a lack of information leading to possible poor choices by an uninformed person. A test writer may similarly create a sentence that will have some hints to extract the meaning of a low frequency word that you may not know. Most current tests evaluate vocabulary not from questions with single words, but instead use the context of a sentence and a list of meanings. Note, a word can have more than one meaning. In the WordNet dictionary, roughly 17% of the 150K words have more than one meaning. Although, a large percentage of the words in the dictionary do have one meaning, they include rare words such as - allelomorphic and intumescent; words that you are unlikely to come across in an exam. The words that do have more than one meaning such as bank appear more frequently in text and are also more likely to appear in an exam. The meanings in WordNet are fine-grained. For example, the word bank has ten different meanings in a noun sense and another eight different meanings in a verb sense. However, you do not need to know all 18 meanings of the word bank. Many of the meanings or senses are subtle differences in the usage of the word and it should be sufficient to know just the primary meanings for an exam. Knowing a word is more complex than simply memorizing all of its meanings. We need to know the usage of the word in context, i.e. in a sentence. For example, the two sentences below illustrate usage of the word intelligence in different contexts

26

2.2. Which Words are Important? He had neither good looks nor intelligence. Mexican crimelord Beno Gildemontes has stolen classified intelligence data . In the first sentence, intelligence is used as a noun and in the second sentence, the same word is used as an adjective. A word can be associated with more than one part of speech. For example, the word brake can be used as a noun or verb. The car’s brakes squealed when the driver attempted to avoid hitting the chicken on the road. We had to brake suddenly when the chicken crossed the road. The frequency of the occurrence of the word as a noun or verb depends on the common usage. A word maybe seen more often with one particular meaning than another. For example, the word pedestrian is more commonly associated with a “person walking on the street” than the less popular meaning of common or uninteresting. Therefore, you will need to know most of the meanings of words, if not all of the meanings. A very rare meaning of a word is unlikely to appear in an exam, unless the exam is intentionally difficult. In an exam, you are likely to find a sentence with the test word missing, but sufficient context to decipher the meaning of the word. For example both of the sentences It’s a pedestrian, flat drama that screams out ’amateur’ in almost every frame. To raise the dancer out of his personal, pedestrian self , Mr. Nikolais has experimented with relating him to a larger, environmental orbit.

27

2. Learning Vocabulary use the word pedestrian with the second meaning of the word. Imagine that the word was missing from the sentences. You can extract the meaning of the word from the surrounding words without a lot of difficulty. Even if you have no clue what a word means, you can use the remaining words in the sentence to limit the number of answer choices and make a calculated guess. Roots Analyzing the letter sequences of an unknown word is another way to guess the word meaning. Consider the word primeval that is made up of a prefix (prim), root (ev), and a suffix (al). The prefix prim is associated with the word first, the root ev with an age or era, and the suffix al with a reference or pertaining to the meaning of the prefix and root. We can conclude that the word primeval means something that existed in the earliest stages of life, by combining the meanings of the prefix, root, and suffix (see Figure 2.2). Figure 2.2.: Building a Word with a Prefix, Root, and Suffix Prefix

Root

prim

ev

Suffix

primeval

nav

naval e

de

28

Word

se

mot

ion

psych, log

ic

greg

al

emotional psychological

ation

desegregation

2.2. Which Words are Important? One of the reasons for the large number of English words is that many of the words were adopted from other languages such as Greek, Latin, French, Spanish, Hindi, and German. Some words such as rendezvous and guru are identical to their foreign counterparts. Other words such as aquatic and chromatic contain parts from non-English words. The same word can appear as a prefix as well as a root. The word polygraph uses the prefix poly, while the word monopoly uses poly as a root. It is not necessary that every word should have a prefix, root, and suffix. The two words primeval and desegregation in Figure 2.2 contain all three components. The word naval has just a prefix and a suffix while the word emotional has a root and a suffix. The study of the deconstruction of words into their components is called etymology. You would assume that if you knew all the roots, prefixes, and suffixes of words, then you could find the meaning of any given word. Unfortunately, this is not true in all cases. There are exceptions, such as the word audacity. Possible roots for this word are aud (to hear), cit (to start/call), and dac (to teach). None of these root word meanings are associated with the actual meaning of audacity (courage or boldness). Some words such as shallow, campaign, and house cannot be split into smaller letter sequences. Although, the meanings of all English words cannot be decoded from their roots, knowing the components of words does help build your vocabulary. A whole family of words can be traced to a root. For example, the words dictator, diction, predict, and verdict share the same root dict, which means to say or tell. While, the root of a word will not make the meaning obvious, it will help you guess the meaning of a word. Therefore, studying the roots of words along with the associated words is

29

2. Learning Vocabulary an useful exercise to build your vocabulary. The numbers of prefixes, suffixes, and roots are far less than the total number of words. It is possible to memorize all or at least remember a majority of the roots of words (see Appendix C).

2.2.3. Can you guess the meaning of a word? It is almost impossible to learn all the words (over 1 million) in the English language and you are bound to come across words whose meaning is not obvious. You can try several different methods to extract the meaning of an unknown word. The first and possibly most valuable method is to use the context, i.e. the meaning of the sentence in which the word appears. This method is accurate as long as you know the meaning of the remaining words in the sentence. In general, if you know at least 95% of the words in a sentence, you can find the meaning of the unknown word using the context. For example, in the sentence – The moon had sunk below the black crest of the mountains and the land, seen through eyes that had grown accustomed to the absence of light, looked primeval, as if no man had ever trespassed before. the last part of the sentence hints at the meaning of the word, primeval. The use of context works because the words used in a well-written sentence are carefully selected such that the meaning of the sentence coincides with the combined meanings of the individual words. If we remove a single word from a sentence with more than 10 words, we can still guess the meaning of the entire sentence. This type of test is common in language exams. This is unfortunately a cyclical situation. Your vocabulary must be reasonably good to understand the context of

30

2.2. Which Words are Important? a sentence. At the same time, you need to know the sentence context to build your vocabulary. Initially, you can build your vocabulary by studying word lists and then you can gradually learn more words from books, articles, and newspapers. The second method is to guess the part of speech. In almost all cases, a vocabulary question will test for the meaning of an adjective, adverb, noun, or verb. Conjunctions, prepositions, and other types of words are unlikely to appear in a test, since they are used so frequently to build sentences. Nouns are typically easier to identify than adjectives and adverbs that may represent an abstract meaning. The third method is to split the word into its prefix, root, and suffix. In section 2.2.2, we saw that a word could be divided into its components. However, not every word will have a prefix, root, and a suffix. Still, if you can identify the root of a word and its meaning appears to coincide with the overall meaning of the sentence, then you can guess the word meaning, that is likely to be correct. Finally, if the sentence in which the word appears does not provide enough context, neighboring sentences may give additional context that can help identify the meaning. However, the further away words are located from the unknown word, the weaker the link between such words. So, a sentence in the beginning of a paragraph is less likely to give strong hints for an unknown word located in the last sentence of a paragraph.

2.2.4. Five Ways to Grow your Vocabulary 1. Read more books and articles: There is a lot of written matter on the Web and you need to read documents from sites that publish well-written proofread articles.

31

2. Learning Vocabulary Sites of well known newspapers such as the Washington Post and the New York Times publish articles that contain many of the words that you can expect in an exam like the SAT or GRE. In most cases, you can download these articles without paying a subscription fee. Examine each sentence and look for key vocabulary words, usage, and context. Many of the articles discuss current affairs and may be of some interest to you. Books are another large source of sentence examples. The Project Gutenberg [6] contains a large collection of e-books that you can read or search. The Emustru software contains a large number of sentences from a collection of Reuters articles, books from Project Gutenberg and other sources. 2. Learn the roots, prefixes, and suffixes of words: Appendix C contains a list of the common roots, prefixes, and suffixes of English words. Memorizing most of the roots can make a difference when you encounter an unknown word. Roots can appear in the beginning or the middle of a word. For example, the root ali appears in the beginning of the word alias and the root cit appears in the middle of the word incite. If you are able to break up an unknown word into a prefix, root, and a suffix, you are more likely to guess the correct meaning of a word. Further, this approach means that you can identify the meaning of a much larger number of words. Just 10% of all possible prefix, root, and suffix combinations in Appendix C create over 100,000 words. Therefore, you can substantially increase your vocabulary by studying the lists of prefixes, roots, and suffixes.

32

2.2. Which Words are Important? 3. Play word games: There are many types of word games that you can play to increase your vocabulary without studying long word lists. Word jumbles are entertaining to some. The letters of a word are scrambled and you must arrange the letters to form a known word. Typically, only one combination of letters will form a legitimate word. You can use your knowledge of roots, prefixes, and suffixes to “build” a word from the set of letters. Other games include hangman and word completion (you fill in the letters that have been omitted from a word). Three games - “Guess the Following Word”, “Word Relationships”, and “Spell Check” from Emustru uses quizzes to test for words found in phrases, word relationships, and spelling mistakes. 4. Take vocabulary tests: Quizzes are a quick way to evaluate your vocabulary knowledge. Some quizzes (2.3.1, 2.3.2) test for the meaning of words in isolation. A word is shown without context and you have to select the correct meaning from a list of words. Other quizzes show a sentence with the test word missing. The set of answers contain the test word along with a collection of misleading responses. This type of question does appear in the Critical Reading section of the SAT exam. 5. Use a dictionary: If you come across a word whose meaning is not obvious, it is easy to look it up online or in a dictionary. The WordNet dictionary is one of the most popular online Web dictionaries: others Web sites include http://dictionary.com and http://wordsmyth.net. Many of these sites include word games that test your spelling skills and word building knowledge.

33

2. Learning Vocabulary

2.3. How to Learn with Online Quizzes A quiz with multiple choice questions is an appropriate tool to learn vocabulary, since vocabulary is mainly concerned with associating a word and its meanings. Knowing the meaning of a word is the first step in building your vocabulary. It is followed by reading sentences that illustrate word usage. Finally, you can claim to know a word when you use the word in your writing. The easiest way to learn with quizzes is to complete one or two a day, over a period of several months. It is difficult to take a large number of quizzes in a single day without becoming fatigued. Building vocabulary is a slow process and it is not advisable to prepare for a language exam the night before with a word list and some online tools. Instead, it is much easier to learn a few words at a time from a quiz and then read books or news articles to view word usage in context. For example, the word schadenfreude (a German word meaning satisfaction or pleasure at someone else’s misfortune) was seen in several news articles at the start of the financial crisis in 2008. The word schadenfreude was used a record 43 times in the New York Times in 2008, following a handful of appearances during the 1980s and 1990s. Test writers use words from a standard word list, but do also pick the occasional new word that comes into vogue. The Web sites mentioned below are just a sample of the sites that you can visit to improve your vocabulary.

34

2.3. How to Learn with Online Quizzes

2.3.1. Visual Thesaurus The Visual Thesaurus (http://www.visualthesaurus.com) is an online thesaurus and dictionary of over 150,000 words (from WordNet) that you can explore and visualize using an interactive map. One of the pages on the site includes an adaptable spelling quiz. The quiz begins with simple 3-5 letter words and then quickly ramps up to longer and more difficult words, if you answer the initial questions correctly. An audio recording of the word is played and some of the meanings and word relationships are also shown. If you miss a word, it is repeated again later in the quiz and your level does not increase. The quiz level remains steady when you answer about half the questions at that level correctly. A standardized score is computed based on your perfomance, with a minimum of 200 and a maximum of 800 (similar to the scores for the SAT exam, see Figure 2.3). The score initially climbs steeply when the first set of 10 questions are all answered correctly. A few questions are missed in the second set of 10 questions and the score stabilizes near 800 in Figure 2.3. The audio recording of each of the 150,000 words was manually generated with the help of a group of four opera singers. Opera singers were chosen for their strong vocal training and stamina to record a large number of words. As you would expect, the quality of the audio from Visual Thesaurus is superior to similar results from automated text to speech software such as FreeTTS. It is a tedious and somewhat expensive affair to record audio files for each of the 150,000 words, but fortunately, the set of words in the WordNet dictionary does not change frequently. A few words are added to and some words are even deleted from the dictionary in each new release. Words that are obsolete or used rarely in current text are removed, while

35

2. Learning Vocabulary

Figure 2.3.: Dynamic Score in an Adaptable Quiz 900 800

Score

700 600 500 400 300 200 0

2

4

6

8

10

12

14

16

18

20

Questions Answered

newer words that have been coined to explain technology or some other topic are added to the dictionary. The Visual Thesaurus collects a large number of statistics from visitors to the site who attempt the spelling quiz. These statistics reveal which words are the hardest to spell and distinguish the good spellers from the average spellers. The scores from 200 to 800 are computed using an algorithm that deducts a fraction for a misspelled word and adds to your score for a correctly spelled word. The raw score is scaled to the range 200 to 800.

36

2.3. How to Learn with Online Quizzes

2.3.2. Free Rice The Free Rice (http://www.freerice.com) site is a non-profit Web site run by the United Nations World Food Program. Visitors take an adaptable quiz similar to the quiz in the Visual Thesaurus. The level of difficulty of the first few questions is low and quickly increases, if you answer the initial questions correctly. Here, the meaning of words is tested and not the spelling, even though an audio recording of the word is also included. Four choices are shown and you must select the correct meaning of a word. For every correct answer, you automatically donate 10 grains of rice through the UN World Food Program. If you miss a question, the level of the following question is set to the next lower level. However, if you answer three consecutive questions correctly at a level x, then the level of the next question is set to x + 1. This means that after answering 15-20 questions, you will reach a range of levels at which you will be challenged. Questions will include some fraction of words that you do know and a number of words that are new to you. Theoretically, questions from this range of levels are appropriate, since you will not be put off by an extremely difficult quiz or a very easy quiz. The two main goals of the site are to provide a free source for anyone to learn and to also distribute free rice to people who cannot afford to buy food. Advertisers on the site are the primary sponsors who make it possible to learn and donate simultaneously. Like the Visual Thesaurus quiz, missed questions are repeated after some period. Free Rice uses a much smaller set (12,000) of words than Visual Thesaurus and scores range from 20 to 60. Anyone scoring above 50 has an excellent vocabulary. Even though Free Rice uses a smaller set of words,

37

2. Learning Vocabulary it is still possible to estimate the vocabulary skill of a person with reasonable accuracy, in a single quiz of 20-30 questions. Although, Free Rice and Visual Thesaurus are both adaptable quizzes that can measure vocabulary skill and are useful to learn new words, you cannot specify the set of words that you need to learn. Visual Thesaurus does include some personalization functions that are only available to paid subscribers. In many cases, the set of words that appear frequently in exams are known, and your results in such exams will be better if you prepare with questions that include just the words from a known set.

2.3.3. Quizlet Quizlet™ is a more general Website than Free Rice and Visual Thesaurus to learn lists of words in any language or terms from any topic and their associated meanings. You can create your own list of terms and meanings and upload the file to Quizlet. The data collection is optionally saved in the public domain and you can share your collection with others. The flashcard model is used in this type of quiz. You can imagine a sample flashcard set of five words apathy: n. lack of care or indifference cajole: v. to urge with gentle appeal disparage: v. to reduce in esteem or rank malicious: a. resulting from malice; harmful stupefy: v. to amaze or astonish

38

2.3. How to Learn with Online Quizzes Quizlet uses five different modes to help the student learn this list of five words. In the first familiarize mode, the words and meanings are shown in a flashcard-like user interface. After you are reasonably familiar with the list of words, you can test yourself in the learn mode. In this mode, the meaning of any one of the five words selected at random, is shown and you have to guess the associated word. Quizlet keeps track of the number of correct responses and periodically re-tests you with the questions that you missed. The third test mode, generates questions and answers in a dynamic quiz. Three types of questions are created - multiple choice, true or false, and free text questions. A multiple choice question has 4-5 answers, of which one is the correct answer. The true or false question shows a meaning and a word and you must verify if the meaning is appropriate for the given word. Finally in a free text question, you need to enter the associated word, given a particular meaning. The fourth scatter mode shows the list of all words and meanings scattered in a window. The aim of this game is to make the entire window blank. A pair, a word and its meaning disappear from the window, when either the word or meaning is dragged and dropped over its partner. In the final race mode, you answer questions as they appear on the screen.

2.3.4. Emustru The user interfaces in the three products - Visual Thesaurus, Free Rice, and Quizlet, are clean and easy to use. One problem with Visual Thesaurus and Free Rice is that you have little control over the contents of the quizzes. You can choose a subject, however, the questions for the subject are pre-determined

39

2. Learning Vocabulary and ranked by difficulty. A second problem is that if your vocabulary is reasonably good, you will be tested for more esoteric words such as necrose, micturate, and ambuscade that are unlikely to appear in an exam. Quizlet is fully customizable and you can import a collection of ten or even several thousand words and meanings. The type of data that you can import is not limited to just words and meanings alone. You can also import world countries / capitals and subject specific terminology / meanings. Emustru includes quizzes that are more focused on English vocabulary for a particular exam or syllabus. For example, roughly 8,500 words that often appear in the SAT exam, are included. Similarly, about 25,000 words from the Brown corpus [5] are also included. You can pick the word list that you would like to learn, the level of difficulty, the order of the questions, and the number of questions (see Figure 2.4). Figure 2.4.: Emustru Quiz Options

Words that appear in quizzes can be selected at random or by rank. The SAT word list includes a rank for each word based on the popularity of the word. A word that appears often in a SAT question is given a higher rank than another word that appears infrequently. For example, words like abrupt and centrifugal have a lower rank than exonerate and prosaic.

40

2.3. How to Learn with Online Quizzes You can learn words from the list in order or at random. A random order of words does make the quiz more unpredictable and may be of interest if you would like to cover a broad range of words. Regardless of the order, Emustru mixes up the quiz with a few questions that you missed earlier, a few questions that you answered correctly, and a set of new questions (see Figure 2.5). Figure 2.5.: A Custom Quiz for a Student Student's Statistics Table Word d a

Word x y

Status Missed Correct

Quiz

a

Word Collection

d

Rank 10 5

x

Upto the first 25% of the questions may use words that were missed earlier. Similarly, a maximum of another 25% of the questions may include words that appeared in questions that were correctly answered. Finally, the remaining questions will use new words. The number of times, words that appeared in questions that were correctly answered, should appear is set in a parameter in the config.php file. If the parameter is set to 0, then a word whose meaning was correctly identified, will not appear again in a quiz.

41

2. Learning Vocabulary

2.4. Why should you learn Spelling? Although spelling is probably one of the most important aspects of learning a language, it is often neglected. The results of bad spelling are seen in poor scores in exams and a low opinion of the writer. There are few reasons to make spelling mistakes on a computer, since the spell check function is included with almost all writing software tools. However, even the best spell check tools do not address grammar errors and the use of the wrong but correctly spelled word in a sentence. For example, the sentence It was a dark knight when I went too the castle. uses the wrong words - knight and too, instead of night and to. These types of errors are difficult to spot with a spell checker. A forgiving reader may ignore a few such errors, but others may form a negative opinion that is difficult to alter. Even if the written matter is interesting, the initial opinion formed based on spelling errors may dominate. This is specially important if you have to write an essay for an exam.

2.4.1. Spelling Error Analysis What kinds of spelling errors are common in English words? Wikipedia contains a list (see Appendix C) of over 2700 misspellings. Most (over 80%) of the misspellings are due to single or double letter errors (see Table 2-1). The edit distance measure is a standard method to compare the separation between two words. For example, the edit distance between the words break and breaks is one. The second word breaks is formed by the addition of a single letter s to the

42

2.4. Why should you learn Spelling?

Table 2.1.: Number of Misspelled Words based on the Number of Error Letters No. of Error Letters No. of Words Percentage 1 1146 42 2 1077 40 ≥3 474 18 original word break. Figure 2.6 shows some of the legitimate words that can be formed at edit distances of 1, 2, and 3 from the original word break. Two words exist at an edit distance of one, 10 words at an edit distance of two, and 59 words at an edit distance of three (all neighboring words are not shown in the figure). The number of possible words increases rapidly at higher edit distances, since you can generate a much larger number of potential words. Over 80% of the spellings errors are within an edit distance of two (see Table 2.1). Further, most of the errors occur in the middle and not at the beginning or end of a word. There are two possible types of errors for an edit distance of one an extra letter was added or a letter was missed. When the edit distance is two, there are three possible errors - two extra letters were added, two letters were missed, or one/two letters were transposed. Table 2.2 contains a list of the error types and sample words. Although you cannot learn spelling by simply analyzing the different types of errors, you can guess if a word is spelt correctly, if you know the most common errors. The sample of 2700 misspelled words from Wikipedia is not very small, yet large enough to draw some conclusions. Figure 2.7 shows some of the types of errors and their frequencies.

43

2. Learning Vocabulary

Figure 2.6.: Words at edit distances of 1, 2, and 3 from the word break. weak

area

back bake

trek

beaks creak

remark wreak

break

read real

breaks

freak peak leak

beak breakup daybreak

balk

bleak brake bread

breach

bureau

breath

The error caused by adding an extra letter is almost twice as frequent as the error due to a missing letter from a word. The letters e, r, and i are the most common reasons for a 1-Letter error, with the letter e being added or missing in the highest number of 1-Letter misspelled words (for example, a missing terminal e in committe or an extra e in heroe). Over 90% of the 2-Letter errors are due to the transposition of letters. The vowels - a, e, and i are the most common letters used wrongly in a misspelled word. For example, the letter a in the misspelled word extant replaces the correct letter e. Similarly, the letter sequence ie in the misspelled word wierd replaces the correct letter sequence ei and vice versa; the letter sequence ei in acheive should be corrected with the letter sequence ie.

44

2.4. Why should you learn Spelling?

Table 2.2.: List of Spelling Error Type One letter: extra l One letter: missing b Two letters: extra a and l Two letters: missing f and l One letter: replaced a with u Two letters: replaced ie with ei

Types and Example Words Correct Wrong colony collony abbreviation abreviation evidently evidentally officially oficialy abundant abundunt achieve acheive

Error analysis is helpful to the extent that you are careful when spelling words that use these letters and sequences.

2.4.2. Emustru Spelling Quiz Unfortunately, there is no easy way to learn spelling without reading and testing your knowledge with quizzes. Emustru includes a quiz to evaluate your spelling skills and learn new words. The words that appear in the quiz are selected by rank or at random from a given word list (see Figure 2.8). The play button in Figure 2.8 is a link to an audio file. If the test word is mundane, then the audio clip will state - “Spell mundane” in a male voice and with a slow speech rate. The FreeTTS speech synthesizer software [16] is used to generate the audio file. In some cases, the quality of the audio is good enough to hear a word very clearly, but in other cases, the pronunciation is difficult to decipher. A few hints are included at the bottom of the screen (not shown) in Figure 2.8. The audio is generated in the WAV file format, that most browsers can play with a plug-in for audio files.

45

2. Learning Vocabulary

Figure 2.7.: Spelling Error Types and Frequency of Errors by Character Sequence

All Errors

Type

%

Extra Letter

63

Letter

%

e i r l

19 9 8 8

Type

%

1-Letter

41

Letter

%

2-Letters

40

Others

19

e r i s

13 12 11 11

Missing Letter 37

Type Extra Letters Missing Letters Transposition

% 6 2 92

Correct

Wrong

%

e a i e i ie o ei

a e a i e ei e ie

6.8 6.2 4.8 4.8 4.0 4.0 2.9 2.6

The only advantage of generating an audio file for each question is that you can use any given word in a word list. This makes it simple to add new words or create a new word list. However, there are several disadvantages. One, in most cases, the quality of manually generated audio files is higher than equivalent automatically generated files. The pronunciation and the voices of opera singers used in Visual Thesaurus is difficult to replicate with software. Two, there is minimal computational overhead when audio files are pre-generated and stored. The appropriate audio file is returned to the browser,

46

2.4. Why should you learn Spelling?

Figure 2.8.: A Spelling Question in Emustru

depending on the question word. Questions with automatically generated audio files will take a little longer to generate, since the software must create a new file for every question. Three, file maintenance is minimal: dynamically generated files must be periodically deleted. In Emustru, the flexibility of creating audio for any word was chosen over the pre-generation of audio files. Although the sound of the synthetic generated speech is unnatural, the purpose of the audio is limited to transmitting the vocalization of a single word or short phrase. The difficulties of perfectly modeling the pitch and pronounciation of a human voice are still challenging. The quality of text-to-speech software has improved steadily and in the near future, it will be possible to generate high quality audio that anyone can recognize. Emustru also includes a spellcheck quiz (see Figure 2.9). Two words are shown - one of the words is misspelled and you need to select the correct word. The misspelled words are

47

2. Learning Vocabulary

Figure 2.9.: Test your Spelling Knowledge

selected from a list of the common spelling errors. In Figure 2.9, the first occurrence of the letter a is replaced with the letter e in the misspelled version of the word. This is a fairly common error, occurring in 6.2% of all transposition errors (see Figure 2.7).

2.5. Words, Meanings, and Relationships Learning the spelling of a word is the first step in understanding a word. The next step is to associate one or more meanings with the word. It is not uncommon for a word to have more than one meaning. About 17% of the words in the WordNet dictionary have two or more meanings. Consider the word pedestrian; a word with two meanings - a noun describing a person walking on a street or an adjective meaning unimaginative (see Figure 2.10). Words and meanings have a many-to-many relationship, i.e. a word can appear in many meanings and a meaning can have many words (synonyms). It is important to know the popular and lesser known meanings of a word (see Section 2.2.2). Test writers are more likely to use a word in the context of a less popular meaning to test your breadth of vocabulary. The

48

2.5. Words, Meanings, and Relationships

Figure 2.10.: Relationships between Words and Meanings Words

Meanings

walker

walker, pedestrian

pedestrian

prosaic

pedestrian, prosaic

commonplace

prosaic, commonplace

Emustru vocabulary quiz uses the top (most frequently interpreted) two meanings of a word in a quiz (see Figure 2.11). Figure 2.11.: Two Meanings of the Same Word in Emustru

Since two questions with the same word, but different meanings in a single quiz, maybe confusing, Emustru repeats the same word in another quiz. The period between the appearances of the same word in a quiz can be configured in the config.php file. Each question contains five possible answers: only one of the answers is correct. The remaining four incorrect answers are selected carefully such that there is no overlap with other words that have the same meaning. For example, the incorrect answers for a question with the test word - prosaic, must exclude words in both meanings of prosaic. Sometimes, a hy-

49

2. Learning Vocabulary ponym or a more generalized meaning of the test word will appear in one of the answers to make the correct answer less obvious. The remaining three answers are selected at random and can be easily eliminated.

2.6. Word Games You can improve your spelling and vocabulary skills playing word games such as Hangman and solving anagrams. Quizlet uses several games to match words and meanings that reinforce the relationship between a word and its meanings. Some of the games including finding as many words as possible from a collection of 8-10 letters. Others include extracting words in a grid of letters. Emustru has five word games that you can play based on your word list, the WordNet dictionary, or a corpus.

2.6.1. Emustru Hangman is a fairly well known game to find a word within n chances. You pick letters from a screen-based keyboard: if the letters appear in the unknown word, they are shown in their letter positions (see Figure 2.12). In general, vowels and a few consonants such as r, s, t, and n are the most frequent letters in words. You are allowed to make upto six incorrect letter guesses. In some Hangman games, you maybe given more chances to guess the letters and even the meaning of the word maybe shown in a hint. The partial word game is a similar game with a few letters of the word that are shown (see Figure 2.13). The letters that are shown are 2 or more consecutive letters from the beginning, middle, or end of the word. You have to

50

2.6. Word Games

Figure 2.12.: Hangman Word Game

complete the remainder of the word. The meaning of the word is shown as a hint. Figure 2.13.: Partial Word and Unscramble Questions for the word affable

The right hand side of Figure 2.13 has the equivalent unscramble question for the same word. The letters of the word are jumbled and you need to enter the letters of the word in the correct order. A hint is included at the bottom of the screen (not shown in the Figure).

51

2. Learning Vocabulary Two-word phrases such as “strong tea” and “absolutely necessary” are seen often in the same order in written text and such phrases become part of the language. You cannot replace the word strong in the phrase “strong tea” with a synonym such as powerful to create an equivalent phrase. The phrase “powerful tea” has the same connotation, but is awkward, since it is rarely seen. Similarly, you can replace the word absolutely in the phrase “absolutely necessary” with a synonym like perfectly. However, the phrase “perfectly necessary” is rarely observed. The phrase game in Emustru uses roughly 5700 popular two word phrases from the Brown corpus. The first or second word of the phrase is shown in a question and you must guess which is the most likely following or preceding word respectively (see Figure 2.14). Figure 2.14.: Guess the Preceding Word

The second word bank of the two word phrase “central bank”, is shown with a list of possible preceding words. Roughly half of the questions in the quiz will show a preceding word and the remainder the following word of a phrase. The purpose of this quiz is to become familiar with common phrases from a large corpus of text and to use these phrases in your own writing in the proper context.

52

2.6. Word Games The last game is based on some of the word relationships defined in the WordNet dictionary / thesaurus. Two sets of words that are related by one of four relationships are shown in a question (see Figure 2.15). The four relationship types are hyponyms, hypernyms, synonyms, and antonyms. Figure 2.15.: A Word Relationship Question

A word x maybe related to more than one word in a single relationship. For example, the word affable has the following synonyms – amiable, cordial, and genial. The antonym relationship is defined between a single word x and another word y. This game is surprisingly difficult, since it requires you to think of word meanings in terms of relationships and is more abstract than the previous games.

53

2. Learning Vocabulary

2.7. Web Sites to Learn Vocabulary and Spelling These are some of the sites on the Web to learn vocabulary and spelling. There are many more, but you can learn a lot from these sites. Visual Thesaurus and Espindle are commercial sites with limited functions for non-paying visitors. 1. http://www.timesspellingbee.co.uk/: A site to practice your spelling skills and compete with others. The audio is clear and interface is simple. You can also guess words that are partially shown in a sentence. 2. http://www.visualthesaurus.com/: The Visual Thesaurus is a graphic dictionary / thesaurus from Thinkmap. The spelling bee includes the audio and the WordNet dictionary meanings of the word. 3. http://www.freerice.com/: Although, every test word in Free Rice includes the audio version, your vocabulary alone is evaluated. You need to guess the meaning of a word from four choices. 4. http://www.espindle.org: A site to improve your spelling with test words from your own word list or the list compiled by a group of volunteers. Every test word includes a sample sentence. 5. http://www.quizlet.com: You can build your own lists of words and meanings in any language in Quizlet. Test yourself with your list or a combination of lists using a variety of quiz types and formats.

54

2.7. Web Sites to Learn Vocabulary and Spelling 6. http://emustru.sf.net: The author’s software from SourceForge.net that you can download and install on your PC. Includes six word games, dynamic vocabulary quizzes, and customizable word lists. 7. http://esl.about.com: A large collection of articles and quizzes to learn pronunciation, vocabulary, and grammar. 8. http://vocabsushi.com: Learn vocabulary using a large collection of sentences from newswire and other sources.

55

2. Learning Vocabulary

56

3. Learning Sentence Construction Writing a short paragraph seems more difficult than having a conversation. There are several reasons why we perceive writing, harder than speaking. A written sentence is generally more formal than a spoken sentence and takes more time to compose, edit, and review. The art of building great sentences is complex and cannot be explained in a single chapter. However, this chapter will use quizzes to identify grammatical errors and find missing words in sentences extracted from newswire articles and classic literature. Each question uses a sentence from a large collection of 35,000 sentences from the Brown corpus and other sources. The sentences cover a range of genres from religion to press articles. As you take more quizzes, you will come across a large number of examples of sentence usage and styles. The style of your sentences will depend on the reader. If your reader is a close friend, then your sentence maybe informal. On the other hand, an essay for an exam or a class should be well organized, clear, and precise. This chapter contains some tips to build better sentences for essays.

57

3. Learning Sentence Construction

3.1. Building Sentences A sentence is a lot more than a group of words arranged in some order. It expresses meaning or thoughts, conjures images, relates to other sentences in a paragraph, and is not longer than necessary. Before you compose your sentence, you would need to identify the subject and the action or operation affecting the subject. The simplest sentences are made up of just a subject and verb, but you can add phrases and clauses to create a more elaborate sentence to precisely express your thought. You will find it easier to compose sentences, if you have a rough draft or outline of the sentences, before you start writing the final version of your passage. An outline forces you to organize your thoughts and order your sentences. With the outline as a guide, you can build your sentences, one at a time, focused on a single thought. It is easy to get distracted, with a number of ideas and to not completely express your intended meaning, when you are thinking of the whole passage. The outline is where you would spend time making sure that all the subjects and thoughts are covered in your passage. You could imagine the outline to be a summary or a concise global view of your passage and the individual sentence as a detailed and local view of an individual thought. As a writer, you need to order, state, and choose the right words that are appropriate for the sentence.

3.1.1. Five tips to build sentences The purpose of these tips is to help you write sentences for essays that maybe machine graded.

58

3.1. Building Sentences 1. Every sentence should have a purpose: an introduction to a topic, a conclusion, a thesis statement or topic description, an argument for or against an issue, or a supporting argument. Any sentence that does not fit into one of these categories is probably not necessary or irrelevant to the essay topic. 2. A sentence pattern that is repeated over and over again will lose its value. The simple sentence pattern with a subject, verb, and object is too common. You can make sentences longer and more interesting by combining clauses, using punctuation, making comparisons, and collapsing similar subjects into a series (see “The Art of Styling Sentences” [34]). 3. Discourse words such as however, although, or firstly compare, contrast, order or elaborate subjects. If any one of these words is detected in the right position – in the beginning of an essay or at the start of a paragraph – then the evaluator will assign the sentence one of the five categories mentioned in the first tip. A sentence that cannot be classified into any of the five categories will be considered irrelevant and not useful in the essay. 4. A sentence pattern should be based on the thought you are conveying to the reader. If two short sentences are related, you could use a semi-colon or colon to combine the two sentences into a single sentence. An explanation can be appended to a general sentence with a colon. 5. If you have the time, you should revise your sentences. On a computer, it is quite simple to make corrections before you submit your essay. When you first build a

59

3. Learning Sentence Construction sentence, you are usually pre-occupied with writing all the thoughts and issues you would like to express than with the style and precision of the sentence. A revision will improve the quality of the sentences and may help you correct spelling or grammatical errors. Example Sentences: An introduction to an essay on the impact of world events on the U.S. economy: As the war with Iraq winds down, worries about the dark threat of terrorism on American soil, and the interminable war in Afghanistan all make for exceptionally nervous markets. The conclusion of an essay on the Cassini-Huygens mission to Saturn: Saturn’s numerous moons and magnificent rings have still so much to tell and to share along with Titan whose mystery was revealed by the Cassini-Huygens mission. A combined sentence: However, insecurity in the country continued as numerous rebel groups emerged to challenge nepotism and tribalism; the government responded ruthlessly arresting, torturing and forcing many into exile. A contrast sentence: Despite the financial rewards, many college students shunned jobs in trading securities.

60

3.1. Building Sentences

3.1.2. Punctuation Although punctuation does not directly add to the content of your essay or even add to the word count, it is extremely important in a machine graded essay. A missed period at the end of a sentence means that the sentence extractor in a machine graded essay will collapse two sentences into a single sentence. This may not be appear to be harmful, but will most likely lead to a grammatically incorrect run-on sentence. Secondly, the machine may incorrectly classify the combined sentence as an introduction instead of an introduction and a main point; A missing main point in a paragraph will be noted. Finally, a machine may not detect the use of discourse words such as despite or firstly that usually appear in the beginning of a sentence. The position of words in a sentence is an indicator of their use; the machine will not correctly tag such words when they are found in other locations of a sentence. Another avoidable error is starting a sentence with a lower case letter. This error is easily noticed by both human and machine graders. Omitting other punctuation marks like the apostrophe can sometime be humorous. For example, the missing apostrophe at the end of the word Residents in – Residents refuse to go into bins. implies that residents are not cooperating and will not enter bins. While the period has a single purpose (to end a sentence), the apostrophe is used to show possession (Jim’s), create a contraction (it’s for it is), omit numbers or letter (’69), and create plurals of words or letters (do’s). A missed comma may not cost you much, but can change the meaning of some

61

3. Learning Sentence Construction sentences. For example the sentence – “Call me Al.” has a different meaning from – “Call me, Al.”. The purpose of adding punctuation marks is to make it easy for the grader to understand your essay. The use of proper punctuation will show that you have taken the time to write your essay with a reader’s point of view. A human grader will clearly appreciate the use of punctuation and a machine grader will precisely identify sentences, clauses, and words.

3.1.3. Are long sentences necessary? A machine may assign a low score to an essay with many short sentences. The average sentence length is an indicator of the essay style and should be in the range of 15-20 words. An essay with a low average sentence length may appear choppy to a human grader. On the other hand, very long sentences are not necessarily better than short sentences. A sentence that is too long maybe crammed with too much information for a single sentence. If you cannot remember the early parts of a sentence by the time you reach the end of a sentence, then your sentence maybe too long. Example Sentences: A long sentence from “Alice in Wonderland”: "Lastly, she pictured to herself how this same little sister of hers would, in the after-time, be herself a grown woman; and how she would keep, through all her riper years, the simple and loving heart of her childhood: and how she would gather about her other little children, and make their eyes bright and eager with many a strange tale, perhaps even with

62

3.1. Building Sentences the dream of Wonderland of long-ago: and how she would feel with all their simple sorrows, and find a pleasure in all their simple joys, remembering her own child- life, and the happy summer days." Two short sentences combined into a longer sentence: The Atlas moth is the world’s largest moth; Its wingspan is about a foot.

3.1.4. Do the use of synonyms improve sentences? A machine grader does count the number of times a word was repeated in an essay; Even though a human grader will not similarly count words, a word that has been excessively repeated will be noticed. Function words (conjunctions, articles, prepositions, and pronouns) are excluded, since such words are repeated in almost all forms of text. An adjective, adverb, or noun that was used every 25-35 words will be flagged. In other words, any such word used ten or more times in an essay of about 300 words will attract the attention of a human grader and will also be marked by a machine. In a short essay, it should be possible to limit the number of times a word is repeated to a maximum of five. About 23% of adjectives, 16% of adverbs, 17% of nouns, and 45% of verbs have synonyms (in WordNet). A noun such as goat or year is precise and has no synonyms. If you intentionally coin a synonym for these types of words, your essay may appear awkward. However, you can use synonyms for some verbs or adjectives to make your essay more readable and less likely to be penalized by a machine grader for word repetition.

63

3. Learning Sentence Construction

3.1.5. Is the sentence precise? A precise sentence conveys an accurate meaning or image to the reader. For example – The state of Georgia produces over 1 billion pounds of peanuts a year is more precise than – “Georgia is a large producer of peanuts.” It is better to express quantity or time with numbers that are easier to compare and visualize than general terms such as high or low. At the same time, too much of detail can overwhelm the reader. A sentence packed with information is likely to be long and difficult to read. If the length of a sentence is over 20 words, you should consider breaking up the sentence into smaller chunks that will be easier to read. A precise sentence usually conveys the intended meaning without making any assumptions. For example sentences containing phrases like – “We all know that”, “It clearly demonstrates”, and “It is obvious” assume the reader has some background knowledge and also appear arrogant. An introduction sentence explaining the background of an argument will make the reader’s task easier and will also be considered as a valid discourse sentence in a machine graded essay.

3.2. Is it grammatically correct? Word processors like Microsoft Word™ (MS Word) and Writer from OpenOffice.org™ include a spell checker and a grammar checker. While spell checkers do have high precision, grammar checkers are unfortunately not as precise. For instance, a fairly obvious grammatical error in the following sentence -

64

3.2. Is it grammatically correct? My farther is fixing the computer. is not flagged by the grammar checker in MS Word. The MS Word grammar checker is designed to find specific types of errors that can be automatically detected and are highly likely to be actual errors. A grammar checker may be tuned to aggressively locate all possible errors. But, there is an accompanying penalty. Such a grammar checker will classify some valid sentences as errors, which is an annoyance. If many such bogus errors are detected, the user of the grammar checker may decide that it is not worth the effort to use the software and bypass the grammar checker altogether. Therefore, a grammar checker needs to balance the number of actual errors detected with the total number of errors. In general, the number of actual errors detected is favored over finding all possible errors. This means that when the grammar checker detects an error, it is very likely to be a real error. However, a grammar checker also skips many errors, such as the one shown in the earlier sentence. So, even though a grammar checker does not flag any errors in a piece of text, it would be unwise to conclude that the text is free of grammatical errors. Punctuation Grammar check and spell check are functions that most of us expect in a word processor. Unfortunately punctuation check, an important part of writing, is absent. Although punctuation may not be perceived as important as parts of speech like nouns and verbs, the use of punctuation to make text clear and unambiguous, does make the reader’s job easy. Appendix B contains a brief description of punctuation characters.

65

3. Learning Sentence Construction

3.2.1. How does a grammar checker work? Text passed to a grammar checker should first be filtered to remove headers, tables, and other extraneous text. Most grammar checkers detect errors, one sentence at a time and cannot accurately parse sentences embedded with additional text. A filtered chunk of text is split into sentences and passed to a grammar checker. First, each of the words in the sentence is assigned a part of speech (POS). Then, the grammar checker will either attempt to build a parse tree to verify that the sentence is valid or use a set of rules to check for syntax violations (see Figure 3.1). Figure 3.1.: Parse Tree-based and Rule-based Grammar Checkers My father is fixing the computer. Generate Tree

Apply Rules

S NP My father

No Errors VP

.

is

VP

fixing

Rule-based check NP

the computer

Parse Tree-based check

66

3.2. Is it grammatically correct? Although the two methods to check the grammar of a sentence perform the same task, the procedure used in each method is quite different. In the parse tree-based method [18], the words of a sentence are converted into phrases, that are further divided if necessary, till all the leaves of the tree consist of a word or a set of words. The rule-based method looks for errors in a set of patterns extracted from the sentence. Each pattern of the sentence is compared against a large number of rules to verify if the pattern is likely or not in a legal sentence. Any patterns that appear to be very rare or unseen in valid sentences are flagged as errors. One of the benefits of the rule-based method is that the results of a grammar check will show the tokens and patterns that appear to violate a language rule. The parse tree-based method will usually fail to build a tree and will flag the sentence as an invalid sentence, but will not indicate the exact reason and possible correction for the tree. The results of a rule violation are more descriptive and can even provide suggestions to correct the error. Manual Rules LanguageTool [19] is a manual rule-based grammar checker for several languages including English, Polish, and German. The set of rules are manually created and stored in a large XML file. Consider a rule to detect a typo in the sentence There exits a glimmer of hope. Notice, a spell checker would not flag this error, since exits is a legitimate word. The rule to spot this specific error would be -

67

3. Learning Sentence Construction Id: “THERE_EXITS” Pattern: There exits Message: Possible typo. Did you mean exists? Incorrect: There exits a distinct possibility. Correct: There exists a distinct possibility. When this rule is fired, the message and examples of incorrect / correct sentences will be shown. The message and correction is quite clear and precise. More general rules use patterns with part of speech tags. LanguageTool uses many hundreds of such rules to find grammatical errors in a sentence. Although, LanguageTool is a very precise grammar checker, there are two drawbacks. One, the manual maintenance of several hundreds of grammar rules is quite tedious although, it has become a little simpler to collaboratively manage large rule sets with the use of Web-based tools. Two, the number of rules needed to cover a majority of the grammatical errors is much larger than the set of manual rules. Therefore, LanguageTool is likely to miss many errors whose patterns are not identified in the set of rules. Finally, each language requires a separate set of manually generated rules. Automatic Rules Grammar checkers based on automatically generated rule sets, have been shown to have reasonable accuracy [20] to be used in applications such as Essay Evaluation. The automated grammatical error detection system called ALEK, is part of a suite of tools being developed by ETS to provide students learning

68

3.2. Is it grammatically correct? writing with diagnostic feedback. A student writes an essay that is automatically evaluated and returned with a list of errors and suggestions. Among the types of errors detected are spelling and grammatical errors. The ALEK grammar checker is built from a large training corpus of approximately 30 million words. Corpora such as CLAWS and the Brown corpus, characterize language usage that has been proofread and is presumed to be correct. The text from these corpora is viewed as positive evidence that is used to build a statistical language model. The correctness of a sentence is verified by comparing the frequencies of chunks of text from the test sentence with similar or equivalent chunks in the generated language model. Consider the erroneous sentence – “My father fixing the computer.”. Each token of the sentence is assigned a POS tag. The tag sequences extracted from this sentence and their likelihoods are shown in Table 3.1. The START and END tags are added to the beginning and the end of the sentence respectively. Table 3.1.: Tag Sequences for an Erroneous Sentence Token My father fixing the computer .

Tag Sequence START - Personal Pronoun Personal Pronoun - Common Noun Common Noun - Present Verb Present Verb - Article Article - Common Noun Common Noun - .

Likelihood 0.33 1.93 -1.11 0.71 1.90 1.32

Error No No Yes No No No

The likelihood of a tag sequence is larger, when it is seen often in verified text samples. Notice, the likelihood of the

69

3. Learning Sentence Construction present participle of a verb following a common noun is negative, i.e. the sequence “father fixing” is rare. However, the likelihood of the past participle of a verb following a common noun is positive, i.e. the sequence “father fixed” is not uncommon. Still, the sentence - “The man fixing the computer was flummoxed.” is a valid sentence. There will be cases where a legitimate sentence will be flagged as an error. The grammar checker in Emustru uses individual tags and tag sequences similar to the ones shown in Table 3.1 to detect errors. A POS tag y that was assigned to a word x in fewer than 5% of all cases in a sample text, is noted in a rule for x. Any sentence that contains the word x tagged with y is considered a potential error by the checker. The types of errors detected with this type of rule are pairs of words that are used incorrectly such as affect and effect or then and than. For example, the probability of finding the word affect used as a noun was less than 3% in the Brown corpus. The rule for the word affect will detect the erroneous use of the word in the sentence below. We submit that this is a most desirable affect of the laws and one of its principal aims. The grammar checker returns the following description - “The word affect is not usually used as a noun, singular, common” and the suggestion - “Refer to affect, did you mean effect”. There are other pairs of such words that are often mixed up, such as bare / bear, accept / except, and loose / lose.

3.2.2. E-rater Grammar Checker E-rater (see Section 4.3), the essay evaluator developed by ETS Technologies, includes a grammar checker that checks for spe-

70

3.2. Is it grammatically correct? cific errors. Following are some of the grammar errors detected by E-rater. Sentence Structure A run-on sentence and a sentence fragment are two types of sentence errors detected in E-rater. A run-on sentence consists of two consecutive independent sentences not separated by a sentence separator character, such as a period. A sentence fragment contains a missing subject or verb and may have words in the wrong order. Run-on sentence: Once upon a time there was a man his name was Abraham. Corrected sentence: Once upon a time there was a man named Abraham. Sentence fragment: The project you submitted is incomplete. Which is why you will have to resubmit it. Corrected sentence: Since the project you submitted is incomplete, you will have to re-submit it. A run-on sentence is fairly easy to detect: if you proofread the sample sentence, you can detect that the word sequence, “man his”, does not sound right and should be separated by a punctuation character. The second sentence (Which is ...) in the sample sentence fragment is not a complete sentence. The meaning of a sentence fragment is usually not obvious, unless the context of the previous or next sentence is used. A reader assumes that the missing subject or predicate in a sentence fragment is stated implicitly in a neighboring sentence. The subject is missing in the sample sentence fragment.

71

3. Learning Sentence Construction Subject-verb Agreement Every sentence should use the same countable form of a subject and verb, i.e. a plural subject is associated with a plural verb and a singular subject with a singular verb. For example, the subjects and the verbs in the sentences below agree in number – The boxes are open. The knives are dull. One of the boxes is open. All, except one of the knives is dull. Notice, that modifiers in the last two sentences changes the number of the subject. Even though the same words – boxes and knives are used in both sentences, the plural form is used in the first two sentences and the singular form in the last two sentences. You will need to look at the complete subject including the modifier to verify that a subject is truly singular or plural. Since the use of bigrams (2-word sequences) alone will not detect these types of errors, E-rater uses filters to verify that a bigram is not part of a phrase. For example, the preposition of precedes the word boxes in the third sample sentence. The modifier of boxes, one of the, is considered before applying a bigram-based rule. The filter detects that the subject is not plural, but singular, and does not flag the use of the singular verb, is, in the third sentence. Similarly, in the fourth sentence, the word knives is modified into its singular form. E-rater detects less than half of the subject-verb agreement errors in an essay [29]. Of the errors that are detected, over 90% are genuine errors. In other words, E-rater will only flag a sentence when it is very confident that an error is present.

72

3.3. Emustru Sentence Quizzes Verb/Noun forms Related to subject-verb agreement errors, noun and verb form errors are the use of singular form in place of a plural form or vice versa. Verb form errors also include the wrong tense and use of the wrong modal, auxiliary, or infinitive verb (see the Writer’s Handbook for English Language Learners [32] for more details). The following two sentences contain verb formation errors. Incorrect: Their parents are expect good grades. Incorrect: Someone else could published a better book. Correct: Their parents are expecting good grades. Correct: Someone else could have published a better book. In the first incorrect sentence, the wrong form of the verb, expect, is used. Similarly, in the second incorrect sentence, the auxiliary verb have is missing. Verbs are the important parts of a sentence and questions testing your knowledge of proper verb usage are almost certain to appear in the SAT or GRE exams. In addition to the Writer’s Handbook, there are other books to learn more about verb and noun forms [14, 15]. Other E-rater detects many other types of errors including the use of the wrong word, a missing word, typographical errors, and part of speech errors (See Section 4.3 for details).

3.3. Emustru Sentence Quizzes With Emustru, you can learn grammar and vocabulary, with two types of quizzes. The first type of quiz tests your skill in

73

3. Learning Sentence Construction identifying a single missing word in a sentence, given a set of words. The selective removal of words in a sentence is also called a Cloze [17] test to measure a student’s comprehension of a sentence and vocabulary knowledge. The order of removal of words could be mechanical, such as the deletion of every 5th word of a sentence or selective. In a selective deletion, words for removal are chosen from a list or based on some other criteria.

3.3.1. Cloze Test A sentence is first selected from a large collection and then one or two words are selectively deleted. The same sentence is not used more than once in a quiz and less than twice for any particular student. The use of different sentences not only makes a quiz more interesting, but also prevents students from answering a question by memorizing the sequence of words in a sentence. Long sentences are not desirable for beginner students, since such sentences can be difficult to comprehend. A student can set a difficulty level to limit the length of a sentence. At the easy level, the maximum length of a sentence is set to 20 words and at the medium and difficult levels the maximum sentence length is 30 and 40 words respectively. The word(s) removed from a selected sentence must also appear in the word list that you have selected. Assuming you have studied a word earlier in a spelling or vocabulary quiz, you can now study the usage of the word. A sentence with one of the following words - not, but, although, however, despite, no, none, never, merely, always, often, gradually, sometimes, because, since, like, therefore, and so is given higher priority than other sentences. These words are also known as discourse

74

3.3. Emustru Sentence Quizzes markers and are used to present information in a formal manner. Discourse markers help develop ideas and establish relationships between one another. Consider the following sentences Gordon A. Lonsdale, 37, a mystery man presumed to be Russian, although he carries a Canadian passport . Despite efforts by Washington last week to play down the significance of the meeting, it clearly was going to be one of the crucial encounters of the cold war. Therefore, if the target can significantly change its location in something less than 30 minutes, the probability of having destroyed it is drastically lowered. The words although, despite, and therefore establish a relationship between the two parts of each of the three sentences. The discourse markers - regarding, as far as, and as for may change the subject in the fragment that follows a marker. Similarly, markers like however and despite are used to present two contrasting ideas and words like since and therefore illustrate a subsequent statement that should logically follow a given statement. Single Word Sentence Completion Single word sentence completion questions are popular in language exams. You use the visible words of a sentence to guess the most likely word that was deleted, based on the context and meaning of the sentence. On occasion, more than one word

75

3. Learning Sentence Construction may be removed. A sentence question with two words missing may be harder to answer than a sentence with a single word missing. Figure 3.2.: A Single Missing Word from a Sentence Jean Bodin, writing in the sixteenth century, may have been the ________ thinker, but it was the vastly influential John Austin who set out the main lines of the concept as now understood. - cruelest - disposable - canyons - fighters - seminal In Figure 3.2, the sentence presents two contrasting thoughts. The first part of the sentence describes a writer from the sixteenth century, while the second part of the sentence is about an influential person describing a concept. The correct word seminal is the most logical word to complete the sentence that compares two influential writers from different times. The other answer words are automatically selected by a function that first looks for a hyponym and then selects three other words that are not one of the synonyms of seminal and not close to the correct word. We define close in terms of the number of characters and operations needed to transform one word into another. These restrictions are necessary to avoid words that may be inflections of the correct word.

76

3.3. Emustru Sentence Quizzes Double Word Sentence Completion Two words omissions in sentences are not necessarily harder than single word omissions. However, an automatic question generator must carefully select the omitted words from a sentence (see Figure 3.3). In most cases, the two omitted words will be separated by one or more words. Figure 3.3.: Two Missing Words from a Sentence Hand in hand with the _________ program is the industry’s self originated and directed _________ program. - legislative, safety - risklessness, sacrilegious - statically, rewardingly - optimistically, corporations - substances, pigmentation The same strategy adopted for single missing words can be used here. The most likely first word is identified followed by a check for the correctness of the second word. Sometimes, the first word may be difficult to identify in the given choices and eliminating the wrong answers from the second word can lead to the correct answer. A misleading answer will have a potentially correct first word, but an incorrect second word. So, you will need to verify that both words are appropriate for the sentence.

3.3.2. Find the Error These types of sentence questions present a sentence with a fragment underlined that may be correct or incorrect. A set

77

3. Learning Sentence Construction of four words are underlined in a sentence and one of these four words may be incorrect in the context of the sentence. You need to identify the incorrect word (see Figure 3.4). A few of the questions may contain zero errors, i.e. the sentence is correct as-is and does not need any modification. This is usually the 5th choice (No Error E). Figure 3.4.: Spot the Error in a Sentence Handing the (A) money (B) over, Russ wiped his hands on his pant-legs as if riding (C) himself of something (D) unclean. No Error (E) -A -B -C -D -E In Figure 3.4, four words have been underlined with the letters A through D. One of these words may be incorrect in the sentence. The last choice E is selected when the sentence appears to be error-free. In this example, the correct word ridding in choice C has been automatically replaced with the word riding. You are not required to give the correct answer in this type of question and instead need to select the word that is incorrect. Sometimes more than one word may be underlined in a single choice. For example, the two word phrases himself of or as if may be possible answer choices. Emustru attempts to duplicate the human process used to generate such questions. First, a valid sentence is selected from a collection of sentences. In 20% of the generated sentences,

78

3.3. Emustru Sentence Quizzes no changes are made to the original sentence. In the remaining 80% of the sentences, a word is randomly selected and replaced with another word that shares a common stem. For example, the two words riding and ridding share the same stem rid. Similarly, the word accepting may be replaced with the past tense of the word accepted, since both words share the same root accept.

3.3.3. Correct the Sentence Another type of question contains a fragment of 5-10 words that is underlined (see Figure 3.5). The sentence in the question maybe correct as-is. The first answer repeats the underlined fragment and is the correct choice, if you believe that the original sentence in the question is error-free. Figure 3.5.: Select the Best Fragment to Complete the Sentence The taking of depositions, he suggesting, should be placed under a special court examiner empowered to compel responsive and relevant answers to exclude immaterial testimony. - of depositions, he suggesting, should be placed under a special - of deposition, he suggested, should be placed under a special - of depositions, he suggested, should be place under a special - of depositions, he suggested, should be placed under a special - of depositions, he suggested, should being placed under a special The remaining four answer choices contain sentence fragments that are potential corrections for the given sentence.

79

3. Learning Sentence Construction The fragment that has been underlined has been selected at random from a sliding window of 12 words. The correct fragment for the sentence is the fourth choice in the Figure 3.5. A few of the words in the remaining incorrect choices have been altered from singular to plural or vice versa.

3.4. Web sites to learn sentence construction 1. http://www.sentencemaster.ca: A collection of games to practice sentence formation, from the elementary to high school levels. 2. http://www.nonstopenglish.com: Games and quizzes to learn vocabulary, grammar, and sentence formation. 3. http://www.tolearnenglish.com: Worksheets and tests to learn grammar and vocabulary. 4. http://www.custom-essays.org: Tips to write essays and many sample essays. 5. http://emustru.sf.net: Download quizzes to test sentence completion and grammar. Identify the error in a sentence and correct sentences.

80

4. Automatic Essay Scoring The SAT and GRE exams include an essay question: An essay prompt is given and you are asked to argue for/against a proposition or describe an event/procedure. In the interest of saving time and money, the Educational Testing Service (ETS), the organization responsible for these exams, has replaced one of the two human graders per essay with an Automated Essay Scoring (AES) grader [24]. This chapter is not a tutorial on writing; There are many excellent books on essay writing and building sentences ([34, 35]). Here, the discussion is about automated methods of evaluating essays and how you should write an essay, if you know that the essay will be graded by a machine. Why AES? Prior to automated essay evaluation, all essays were manually scored, adding to the high cost of evaluation. Consider the SAT exam that a million or more students may take in any given year (in 2006, the exam was taken 1.4 million times [23]). Each of the million essays of roughly 100-300 words, must be read and evaluated. A human grader spends 3-4 minutes per essay to compute an overall score. In large scale exams like the SAT, the use of an automated system can significantly reduce the cost of evaluation and the time to complete the evaluation. Finally, a human grader is susceptible to fatigue or may be biased if a topic is open-ended and possibly

81

4. Automatic Essay Scoring controversial. An automated system runs the same algorithm to compute a score for all essays and is free of any bias. The time to correct an essay is also a burden for a class teacher. Imagine a class of 30 students or more, who submit 23 writing assignments per week. A teacher will need to correct about one hundred essays per week; a time consuming and dull task. The use of AES can reduce this burden to some extent through an initial machine evaluation to identify some of the obvious errors in an essay. It is also difficult for a teacher to identify common errors in a collection of 30 or more essays. An AES can easily collect, maintain, and summarize global information for a large collection of essays. Does AES work? A common criticism of AES is that a machine never really understands the contents of an essay and instead assigns scores based on a set of features. It is true that an evaluation algorithm does not actually comprehend an essay. Yet, it has been shown that a small number of carefully selected features are sufficient for an algorithm to compute a score that is very close to the score that a human would have assigned to the same essay. Consider the E-rater ™ [24, 25] essay evaluator from ETS Technologies, that has been used to evaluate over a million essays. Every essay is scored in the range of 1 to 6, where 1 is the lowest and 6 the highest score respectively. In 97% of the essays, the absolute score difference between a human grader and the E-rater was less than 2. An absolute score difference of more than 1 was resolved by a second human grader. Even though human and AES graders strongly correlate, it is possible to generate a poor essay that would score high with an AES grader. A human grader assigns a grade to an essay

82

4.1. How does it Work? based on concepts such as organization, discourse, and structure. An AES implicitly computes values for features that represent these concepts, and it is possible to generate an artificial essay that scores well with an AES, but is actually a poor essay. However, the effort to create such an essay is not minimal and would require a trained writer to generate text, such that most of the features used in an AES are fully represented in the essay. It is unlikely that AES will identify the next great writer, given the limitations of the technology. But, automatic evaluation of an essay can simplify a teacher’s job in a classroom. A student can submit an essay to a machine, make corrections based on the feedback from the AES, and then submit a second and possibly improved version of an essay to the teacher.

4.1. How does it Work? First, consider how a human grader evaluates an essay. A human grader reads the entire essay, forms an opinion on the quality of the essay, and assigns a score. This score is also called a holistic score, based on the grader’s overall impression of the essay. A holistic score is a single value in a range (16) computed from the grader’s evaluation of a set of essay characteristics. While reading the essay, a grader looks for certain traits that characterize a good essay. The presence of such traits or features in an essay motivate the grader to assign a higher score to such essays compared to other essays in which these traits are absent. Common traits include content, creativity, mechanics, style, and organization. An AES scans the contents of an essay and searches for the presence of such traits. The difficulties in building a precise

83

4. Automatic Essay Scoring AES lie in identifying features that accurately represent these traits. Page and Petersen [9] use the terms trins and proxes to describe traits and features respectively. Trins represent characteristics that a human grader evaluates in an essay such as style, organization, and content. On the SAT, the types of characteristics that should be present in a high scoring essay include a well-stated and developed point of view, critical thinking, examples, supporting evidence, coherent arguments, strong vocabulary, and grammatically correct sentences. A proxe or approximation is a variable that is automatically extracted and roughly estimates a trin or characteristic. Some of the roughly 30 proxes used in Project Essay Grade (PEG [9]) include - the average sentence length, the number of paragraphs, total number of words, and average word length. A proxe may represent part of one or more trins and a trin may use multiple proxies. In other words, there is a many-to-many relationship between trins and proxes.

4.1.1. Traits and Features An AES is more likely to be accurate when a proxe closely represents a trait. How do we find proxes or variables that define traits? The best way is to ask human graders what they look for in an essay, to evaluate a particular trait. For example, the total number of words, the number of unique words, and the presence of domain specific words are proxes to measure the content of an essay. A human grader may not actually count the number of occurrences of words, but will make judgments from an estimate of the length of the essay, the presence of specific words, and the use of vocabulary. An AES can make very precise counts of words, word frequencies, sentences, word

84

4.1. How does it Work? types, and other parameters. Table 4.1 shows a list of traits and associated features that can be measured in an essay. Table 4.1.: Traits and Associated Features Trait

Features

Grammar Usage

Measures of grammatical errors Misuse of articles, wrong word forms, preposition errors, and faulty comparisons Spelling mistakes and missing punctuation marks. Use of passive voice, inappropriate sentence lengths, and faulty conjunction usage. Presence of an introduction, content paragraphs, and a conclusion The average length of a discourse element Average word length and number of medium-long words Presence of prompt-specific terms

Mechanics Style

Organization Development Lexical Complexity Vocabulary Usage

The types of grammatical errors identified for the grammar trait can include missing punctuation, run-on sentences, subject-verb agreement, ill-formed verbs, pronoun errors, and forms of garbled sentences. The measures of the usage trait are also mostly grammatical and include the misuse of articles, wrong word forms, confused words, preposition errors, and faulty comparisons. The sentence - “We don’t have many information on the subject”, does not use the proper article.

85

4. Automatic Essay Scoring The wrong word form of danger is used in the sentence - “Until recently, the Hudson river contained danger levels of pollutants”. The incorrect word, effect, is used in the sentence - “Lack of sleep effects the quality of work.”, instead of the word, affect. The sentence - “They arrived to the town” contains a preposition error. A faulty comparison compares two nouns that are not alike. For example, the sentence - “The weather in Germany is colder than Gabon”, makes an illogical comparison between weather and a country. The mechanics of an essay includes spelling mistakes, the wrong case of a letter in a word, missing punctuation marks, and incorrect fused or compound words. The measures to evaluate the style of an essay look for the use of passive voice, repetition of words, and sentences that are either too long or too short. An essay for a typical prompt in an exam is expected to have an introduction and a conclusion. Between these two discourse elements, an essay should also contain main points, supporting material, and a thesis. The absence of these discourse elements in an essay will potentially lead to a lower score. A discourse element such as a main point without any supporting material is weak and possibly not fully developed. A completely developed main point will have at least one or more sentences to support the argument. The measures for lexical complexity evaluate the usage of words. A large number of words that are more than five or six characters long may indicate a strong vocabulary. Finally, in a group of essays generated for a specific prompt, we would expect to see a similar set of content words in essays with high scores. These words are prompt specific and may represent common terminology used to discuss the essay prompt.

86

4.1. How does it Work? Traits such as content and organization can be reasonably approximated using a set of variables. But, other traits such as creativity are hard to define in the form of an algorithm that can be coded in an AES. It is difficult, because there is no model that an AES can use to precisely evaluate the degree of creativity in an essay. Such traits, that are based on features that cannot be estimated apriori are difficult to approximate and are a potential source of errors in an AES. Despite these deficiencies in an AES, scores computed automatically have a strong correlation with the scores of a human grader. This is possibly because the traits that an AES can evaluate with some accuracy are sufficient to generate a precise score. The precision of an AES is not proportional to the number of features. In other words, a few good features are sufficient to accurately categorize an essay[25]. Further, an essay with a strong trait like creativity is usually accompanied by high values in other traits such as organization and vocabulary.

4.1.2. Creating an Essay Model for an AES We first need an essay model; the purpose of an essay model is to evaluate and score an unseen essay. The essay model is generated from a set of pre-scored (training) essays. These essays have been graded earlier and an accurate model would include a reasonable number of example essays for each of the six categories. Each training essay is converted to a vector of features extracted from the text of the essay (see Figure 4.1). For example, one of the values in the vector represents the number of unique words in the essay. Similarly, other values in the vector would represent the number of unique words, the average sentence length, and the number of spelling errors.

87

4. Automatic Essay Scoring

Figure 4.1.: Building an Essay Model with Training Essays Training Essays for Categories 1

2

3

4

5

6 Raw Text

Extract word, sentence, and global features Vectors Generate and save a model

Essay Model

For every vector, we know the associated category, since the training essays have been scored earlier. The model is a logistic regression classifier [8] created from the set of training vectors and associated categories. A logistic classifier assigns a weight to each of the features in the vector generated from the training essays, such that the weighted vector fits a model generated from the set of training essays. In other words, a feature x with high values for a particular category y alone, will have a higher weight in the model for y. If an unseen essay contains a high value for feature x, it is more likely to be assigned to category y. For example, the model for category 6, will assign a high weight to the total number of words in an essay. So, a long essay is more likely to be assigned to category 6 than other categories.

88

4.2. Applying AES

4.1.3. Using a Model to Assign a Score An unseen essay is first converted to a vector of values in the same manner a vector was created for the set of training essays. The vector is a list of feature values, but excludes any category. The logistic classifier built earlier, accepts a vector and returns the closest category based on the trained model (see Figure 4.2). Figure 4.2.: Assigning a Category to an Essay Test Essay Raw Text Extract word, sentence, and global features Vector Read model and assign the closest category

Essay Model

A category from 1-6

The model automatically builds a machine representation of the characteristics of good, bad, and average writing from the training essays. Therefore, the precision of the model depends on the quality of the training essays; each training essay should be assigned to the correct category. A bad essay and an excellent essay should be assigned to categories 1 and 6 respectively.

4.2. Applying AES The use of AES to teach and evaluate writing has become popular in schools and universities; some of the commercial AES

89

4. Automatic Essay Scoring products include E-rater ™ [24], Intellimetric ™[27], and Intelligent Essay Assessor™ [13]. These products have been successfully used with a large number of essays on many different topics. However, AES is not perfect and a human grader can give better feedback than a machine. Still, its use in grading essays and teaching writing continues to grow.

4.2.1. Is AES Valid? In several evaluations, AES products have shown a high correlation with human graders and the use of AES in competitive exams is accepted, despite its weaknesses. Even though critics may claim that an AES does not understand an essay in the same way a human can appreciate an essay, there is no denial that the final outcome (score) of an AES is valid in most cases. A secondary issue is whether a student can write a bad faith essay to fool the AES into assigning a high score. A human grader would quickly detect a bogus essay and assign a low score. But, an AES can be deceived by an essay that scores well in the features needed for a high score. For example, consider an essay that is reasonably long, uses a large vocabulary, with no grammatical errors, and is largely coherent. Such as essay would receive a high score, even though the facts mentioned and examples were completely wrong. The AES has no background knowledge to detect such errors. The Intelligent Essay Assessor (IEA) claims to overcome this problem with a collection of pre-scored essays on a particular topic. An unscored essay would receive a high score, if it appears to be close to a group of essays that were assigned high scores prior to the evaluation. The assumption is that high scoring essays for a particular topic will look more similar to

90

4.2. Applying AES another unseen but well-written essay than a poor and irrelevant essay. IEA uses a matrix factorization algorithm to simultaneously consider all words in an essay. This makes it difficult to write a bad faith essay, since there are no known features that can be artificially manipulated to generate a high score. Instead, IEA relies on the content words alone. Why does AES work: When the possible scores for an essay are in the range 1-6, even a very primitive AES will be correct at least half the time. Consider, a human grader’s score of x for an essay, a random number between 1 and 6 will be within ±1 of x, about 45% of the time. If the extreme scores of 1 and 6 are ignored, then the random score will be correct in about half of all cases. This means that an AES has to make an intelligent guess of the score in a fairly narrow range, to be correct.

4.2.2. Essay Prompt An essay prompt describes the main topic or issue that the student’s essay should discuss. A few sample prompts are listed below. • How do you feel about people using cell phones in public? Should cell phones be banned in public places? Why or why not? • What is your favorite time of the year? Why? What do you like about that period?

91

4. Automatic Essay Scoring • You have passed a driving test. Your friend who does not have a driver’s license would like to know the procedure. Explain how you passed the driving test. The first prompt is an argumentative prompt. There is no right or wrong answer and an argument can be made both ways to support or disapprove a ban on cell phones in public places. Essays based on these types of argumentative prompts are a little harder to compose than other prompts. An argumentative essay must first make a thesis statement and present a well-developed list of supporting arguments. The second prompt is a descriptive prompt. An essay for a descriptive prompt creates the background for an event or period and elaborates on the topic. A good essay for such a prompt lists the facts justifying the opinion of the writer in a logical order. The third prompt is an expository prompt. This type of essay explains a procedure step-by-step from the start to finish in order. All prompts are not equally difficult. Prompts on a complex topic may be harder to write about than a simple topic. For example, a descriptive prompt is a more appropriate assignment for a fourth grade class than an argumentative prompt. The AES does not make distinctions between an easy or difficult prompt and treats essays for all prompts in the same manner. An AES such as the IEA that relies on content words, uses separate models for each prompt. In other AES products, it would not be appropriate to use a model trained with essays from students of the fourth grade to evaluate essays from students of the twelfth grade. Similarly, it would not be appropriate to grade the Gettysburg address using a model generated from student essays.

92

4.2. Applying AES The E-rater 2.0 [11] can use a model that is not based on any particular prompt to grade essays. This is very convenient, since it is not necessary to create separate models based on each topic. A single model can capture the necessary information to score any essay. One argument against the use of a single model for all prompts is that content specific words are not given any additional importance in the model. The use of content specific words in an essay is an indicator that the student has understood the prompt and the essay is relevant.

4.2.3. Essay Length Products like Intellimetric use several hundred features in contrast to the much fewer number of features in E-rater. The number of features appears to play less of a role in the quality of grading results of an AES. On the other hand, one particular feature, the essay length is the most dominant feature. The score of an essay was very closely related to the essay length (number of words). It would seem as if a student could easily fake the AES into assigning a high score simply by generating a long bogus essay with a large number of words. The simplest method to generate a long essay is to repeat a sentence endlessly till the essay is sufficiently long. However, such an essay would have very low values for other features such as the number of different words and the sentence length standard deviation. Further, a student would not risk submitting such an essay if there was a possibility that a human grader may score the essay. Yet, as long as there is a possibility that a bad faith essay may be scored incorrectly by an AES, it is unlikely that human graders can be completely replaced.

93

4. Automatic Essay Scoring The benefits of a long essay for a competitive exam diminish beyond three main points. Increasing the number of examples from one to three will improve an essay more significantly than from three to five examples. The development or elaboration of an example is also a feature used to compute the essay score. A fully developed example with an introduction, thesis, and strong supporting material will contribute to the final score.

4.3. How do you write an essay for E-rater? At first, it may appear illogical to write your essay based on evaluation criteria selected by a machine. However, if your essay’s AES score is high, it is very likely that a human grader will also assign a high score within ±1 of the AES score. This assumes that you plan on writing a good faith essay; it is possible to write a poor essay that the AES algorithm will assign a high score. However, the risks of writing such an essay are high, since a human grader cannot be deceived and will appropriately score the essay in a lower category. The discrepancy between the human and machine grades will be resolved by another grader, who is more than likely to concur with the first human grader. Therefore, even if you could deceive the AES algorithm, you will still receive a low score for a bad faith essay. The E-rater AES [25] used by ETS measures several traits including grammar, usage, mechanics, style, organization, lexical complexity, and prompt-specific vocabulary. E-rater version 2, uses a small set of 10 features, that are closely related to essay writing traits. By contrast, other commercial AESs

94

4.3. How do you write an essay for E-rater? [27, 13], use a larger number of features, ranging from 50 to several hundred. No relationship has been established between the quality of essay evaluation and the number of features; a few good features are sufficient for reasonably precise scores. The length of an essay was one of the strongest features to automatically compute a score, but has been de-emphasized in E-rater. Instead, an implicit computation of essay length is used (see section 4.3.5). One of the criticisms of AES was the excessive importance given to the essay length feature, and therefore, ETS specifically mentions that E-rater does not use the essay length feature. The first four of the 10 features in E-rater are the number of grammar, usage, mechanics, and style errors in an essay. Erater modifies the raw counts of the number of errors in each of the four categories to create a more uniform distribution of errors. Consider a 300-word essay with two grammar errors. The raw count of two errors is incremented to three and divided by the length of the essay, 300. The log of the result, log(0.01), is used in the final computation of the score. The raw counts are incremented by one to avoid exceptions due to calculations with zero. The number of errors is scaled by the length of the essay to avoid penalizing long essays. So, two errors in a short essay will be penalized more heavily than two errors in a longer essay. A log transformation of the scaled raw count brings together values that are scattered and separates values that are very close.

4.3.1. Grammar The types of grammar errors detected include run-on sentences, garbled sentences, subject-verb agreement, ill-formed verbs,

95

4. Automatic Essay Scoring pronoun errors, possessive errors, and missing or wrong words. The grammar checker in E-rater checks a sequence of two words (bigrams) at a time in a sentence for correctness. In a correct sequence, the parts of speech of a bigram, should have been seen earlier in grammatically error-free sentences (see Section 3.2.1 for details). Any bigram that is rarely seen in language, is marked as an error. For example, an adjective following a noun is rare and is classified as a grammatical error. Similarly, a plural verb rarely follows a singular noun. This works when such rules are strictly followed in sentences. However, consider the following sentences My colleagues at the [company presume] I am working. My [company presumes] that I am working. The singular and plural forms of the verb presume, follow the noun company. A statistical grammar checker may flag the first sentence, since a singular collective noun is followed by a plural verb. E-rater uses filters to allow such sequences, even though the automatically generated rules indicate that the sequence is rare. The grammar rules applied to evaluate a sentence, depend on the frequency of observed bigrams in the corpus. E-rater’s grammar checker was trained on a corpus of about 30 million words from newswire text. All possible grammatical errors will not be detected, and you need to make sure that your sentence does not contain any of the grammar errors that E-rater can detect (see Section 3.2.2).

96

4.3. How do you write an essay for E-rater?

4.3.2. Usage Usage errors are common mistakes such as – the wrong or missing article, confused words, the wrong form of a word, a faulty comparison, preposition error, or a non-standard verb form. Wrong or Missing Article Incorrect: I am going to airport. Correct: I am going to the airport. (a known or previously mentioned airport with a definite article) Correct: I am going to an airport. (an unspecified airport with an indefinite article) Correct: I am going to a bazaar. (an unspecified bazaar with an indefinite article) Since some languages do not have articles of speech, this is a common error. One of the articles, the word the, happens to be the most frequently used word in the English language. You will need an article when you refer to a noun. Typically, the first time you refer to a noun, you would use an indefinite article, and a definite article in subsequent references. The indefinite article, an, is used when the word starts with a vowel or starts with the letter h. (some words starting with h may not use an depending on the pronunciation). Confused Words Table 4.2 contains a list of common words that are not used correctly. The word effect is sometimes used as a verb: “The way to effect change is to be part of it.”. The

97

4. Automatic Essay Scoring word counsel can also be a noun: “Seek legal counsel before you take him to court.” (see Appendix C to download a longer list of confused words and example sentences). Table 4.2.: Sample List of Confused Words Word

Example Sentence

advice (noun)

Your advice is sound and I will study hard. I would advise you to reconsider. Please accept this token of gratitude. All of you have done well, except for one. Inflation is affected by monetary policies. The effects of inflation are devastating. The City Council passed a resolution. He counsels college-bound students. Look out for your personal belongings in an airport. The personnel department is located below. The plates are over there. Their bags have been packed. They are busy with the collection.

advise (verb) accept (verb) except (preposition) affect (verb) effect (noun) council (noun) counsel (verb) personal (adjective) personnel (noun) there (preposition) their (pronoun) they’re

Wrong Form of a Word: An intended meaning in a sentence will not be conveyed unless you use the correct form of a word. For example, the sentence – “Phillip was an elegant speaker.” uses the wrong word elegant instead of the word eloquent.

98

4.3. How do you write an essay for E-rater? Faulty Comparison: The meaning of the following two sentences –“I like science fiction movies more than horror.” and “The weather in Russia is colder than Vietnam.”, maybe correctly interpreted by a reader. However, both sentences use faulty comparisons: In the first sentence, movies is compared with an adjective horror, and in the second sentence, weather is compared with a country, Vietnam. The word movies should be added to the end of the first sentence and the last word in the second sentence, Vietnam, should be changed to Vietnam’s weather. Preposition Error A few prepositions – in (21.4%), to (20.8%) and of (16.6%), account for a majority of the preposition errors (see Appendix C). The sentence - “He went to outside.” wrongly adds the preposition, to. The most common (17%) preposition error was using the words to and of, when no preposition was necessary. The remaining errors mis-used the prepositions – in, at, and for. E-rater detects roughly one out of five such preposition errors with an accuracy rate of over 80%. Non-standard Verb Form Words such as gotta, gonna, or wanna used in spoken language are flagged as errors in written text. You will have to use the expanded versions of these words (got to, going to, and want to).

4.3.3. Mechanics Mechanics errors are mostly word form errors: a misspelled word, a missing punctuation, or a missing capital letter in a word. Although these types of errors may seem petty, a missing punctuation error can alter the meaning of a sentence mak-

99

4. Automatic Essay Scoring ing the reader pause to confirm the writer’s intended meaning. The human evaluator of an essay with several such errors can quickly become frustrated. E-rater keeps track of the number of Mechanics errors and will adjust the final score appropriately. Since many of these errors are usually careless omissions, it pays to proofread your essay for minor errors, before submission. Spelling Spelling errors are relatively easy to detect: All words, excluding proper nouns, not found in a dictionary are considered as potential errors. However, a spell checker may wrongly flag inflected words, fused words, or word fragments that are not found in the dictionary. Appendix C contains a list of 2500+ words that are frequently misspelled. The types of words that are misspelled are those that use a letter that sounds the same or do not follow a particular rule. For example, the word absence is misspelled as absense and the word despite as dispite. Rules for letter sequences are not consistent. The letter i sometimes precedes e (believe) and sometimes follows e (receive). A letter (g) that appears in a word (campaign) is not pronounced. A few words (commando, immediately, and recommend) use two letters (mm). Letter Errors • The first word of every sentence should begin with a capital letter. • Every proper noun should begin with a capital letter. • A sentence containing a question should end with a question mark.

100

4.3. How do you write an essay for E-rater? Punctuation Errors • Every sentence should end with a punctuation mark (a sentence separator character – ?, ., or !). • Although the apostrophe is a tiny punctuation character that is easily overlooked, a missing apostrophe alters the meaning of a sentence. The sentence – “The audience last night did not respond with either applause or boos to mention of Hughes remark.” is missing an apostrophe after the word Hughes. The meaning of the sentence without the apostrophe implies that the “Hughes remark” is a type of remark. • Notice if the last comma in the following sentence is dropped, the sentence has a strange meaning – “The Mayor apparently received the Bronx leader’s assent to dropping Controller Lawrence E. Gerosa, who lives in the Bronx, from this year ’s ticket”. Word Errors E-rater flags words that do not appear to be proper nouns and are not found in the dictionary. These words are potentially spelling errors and can be easily avoided with a little care. The hyphenated word is the first type of spelling error. The rules for using hyphens are not explicit; the word “anti-virus program” uses a hyphen, while “antiviral agent” does not (in the WordNet dictionary). Some words such as “battery acid” can be spelt with or without a hyphen. The hyphen is introduced when a new word is formed with two or more words. For example, the words cell-phone and e-mail were first hyphenated, but have since become accepted single words – cellphone and email.

101

4. Automatic Essay Scoring About 95% of the hyphenated words in the WordNet dictionary are nouns and adjectives. More adjectives (13%) are hyphenated than nouns (3%). Some hyphenated adjectives are noun phrases such as “slippery-eel” that can be used with or without a hyphen depending on the meaning. A “slippery eel” salesman sells slippery eels while a “slippery-eel” salesman slips away with your money. Hyphenated words are found more often before a noun than after. When in doubt it is preferable to drop the hyphen, since E-rater will not flag a non-hyphenated word, if the component words are spelt correctly and not word fragments. Fused words that are not found in the dictionary are almost certain spelling errors that will be detected. Words such as lifehack, podjack, or listism may or may not be found in the dictionary. A fused word can also be unintentionally formed when a space is accidentally omitted. The only way to detect such errors is to proofread your essay and verify that there is a space between all word boundaries. Compound words, such as newspaper, pigpen, and eyebrow, are made up of two or more words without a space. These words are very likely to be found in the dictionary and E-rater will not flag such words. You cannot split such words, since in most cases the combined meaning of the individual words is not the same as the meaning of the compound word. A sequence of duplicate words (such as “the the” or “of of ”) are found more often in computer-generated documents than hand written work. The same word is found repeated consecutively in a sentence, when a single word was intended. If you type your essay on a word processor, a grammar checker will detect these errors, but in an exam you may need to manually proofread your essay.

102

4.3. How do you write an essay for E-rater?

4.3.4. Style Writing style is subjective: A good or bad style depends on the reader’s likes and dislikes. Since the essay you will write will be graded by E-rater, you will need to create an essay that satisfies E-rater’s view of good style. E-rater collects statistics and searches for patterns to evaluate style. Repetitive Words: The use of the same word in a sentence, when alternate words that convey the same meaning could have been used is considered poor style. For example in the sentences below, the name Jacob is repeated, though substitute words would have retained the meaning of the sentence. Jacob plays football for his school. Jacob also studies hard. How does Jacob do it? Does Jacob get tired of it all? You can use a pronoun to avoid repeating a proper noun. Similarly, the use of synonyms is another way to limit the number of repeated words. Jacob has nice techniques. He is a nice kid. He also wears nice clothes. The word nice is repeated three times in a short chunk of text and is noticeable to a reader. Function words like and, of, and the are automatically excluded since these words occur often in all styles of text. A flagged repetitive word is usually an adjective or a noun. E-rater uses seven features to decide if a word x has been used repetitively or not. • The total number of occurrences of the word x in the essay.

103

4. Automatic Essay Scoring • The relative frequency of x in the essay (the number of times x is seen in say, 100 words). • The average relative frequency of x in a paragraph (the sum of the number of occurrences of x in say, 25 words in all paragraphs divided by the number of paragraphs). • The highest relative frequency of x in the essay. • The number of characters in x. • A pronoun (yes or no). • The average word distance between successive occurrences of x in the essay. The easiest way to avoid repeating a word is to use its synonym, choose an inflected word, use a pronoun, or use words like former and latter. If this is not possible, then it maybe preferrable to simply repeat a word instead of using an artificial replacement of the word. For example, you will expect to see the word whale repeated in an essay about whales. Inappropriate Words or Phrases: Written language is more formal than spoken language and sentences like No way they can win now. How come it didn’t work? Check it out for yourself. will not be viewed positively by a human grader. You can re-word these sentence in a more formal manner

104

4.3. How do you write an essay for E-rater? There is no way they can win now. Why didn’t it work? You can verify it for yourself. A few words such as awesome, cool, and dude are overused in spoken language. There are alternative words (inspiring, impressive, stunning) to make your essay appear more formal. Passive Sentences: Passive sentences are usually longer and tend to be less interesting than sentences in active voice. For example, the active version of the sentence below is preferred over the passive version. Passive: The National Anthem will be sung by Jacob. Active: Jacob will sing the National Anthem. In a passive sentence, the subject (National Anthem) is the receiver of an action (sung); In an active sentence, the subject (Jacob) performs the action (sing). In general, you should limit the number of passive sentences to 5% or fewer sentences. However, there are occasions when a passive sentence is appropriate. The doer of an action is not always the most important entity of a sentence. For example in the following sentence, Club members are requested to complete the survey. the group or individual asking the club members to complete a survey is not mentioned. Instead, more importance is given to the group (club members) that receives the action (request to

105

4. Automatic Essay Scoring fill out a survey). Sometimes, the action is mentioned without the doer. The anti-virus software in the computer has been updated. The person responsible for updating the anti-virus software is not important enough in the sentence and therefore omitted. Passive voice is also found in papers where facts are stated without specifically mentioning the doer of the action. For example, an informative essay with a supporting argument may claim an observation without a doer. The collected statistics were inconclusive. The doer is implicitly assumed to be a scientist or organization mentioned earlier. Though such sentences are legitimate in an essay, you should try to keep the number of passive sentences to a minimum. Sentence Lengths: Good writing contains sentences of a variety of lengths. Too many short sentences makes the writing look choppy. You can join short sentences to make a longer sentence (see Section 3.1.3). A variety of short and long sentences makes the writing more interesting. A short sentence after several long sentences does create a dramatic effect. The reader may pause and reflect on the earlier longer sentences. However, E-rater merely collects sentence length statistics that will be used to compute the final score. Sentences Starting with Coordinating Conjunctions: Coordinating conjunctions – for, and, nor, but, or, yet, and so

106

4.3. How do you write an essay for E-rater? (easily remembered as FANBOYS) – combine two simple sentences, each with a subject and a verb, into a single longer sentence. The two sentences – It was the right color and it was cheap. The children denied any responsibility, but the glass was cracked. use the conjunction and and but to combine related sentences. Since the purpose of a coordinating conjunction is to “coordinate” two sentences, such conjunctions usually appear in the middle of a sentence. A sentence that begins with a coordinating conjunction may appear to be incomplete and is not a positive indicator of good style. And the woman wore a black dress. Since E-rater will track the number of sentences that begin with a coordinating conjunction, you should minimize the number of such sentences.

4.3.5. Organization and Development E-rater compares your essay with a standard five-paragraph essay (introduction, conclusion, and three body paragraphs). Your essay will be considered less organized, if it differs significantly from the standard. There is no penalty in writing more than three body paragraphs, however, you do not receive any extra credit for a fourth, fifth, or sixth body paragraph. Discourse Classifier: Every sentence in your essay is classified into one of the six discourse labels - introduction, conclusion, thesis, main point, supporting point, and other. The

107

4. Automatic Essay Scoring discourse classifier does not classify one sentence at a time (see Figure 4.3). Figure 4.3.: Automatic Discourse Classification Ordered Set of Sentences

Discourse Marker Classifier

L1

S1 S2

Sn

Ordered Set of Labels

Extract features and assign labels

L2

Ln

A sequence of n discourse labels for n sentences, with the highest probability is selected. The probability of a label Li for a sentence depends on the previous labels – Li−1 and Li−2 . Special labels start and end are added to the beginning and end of the label sequence respectively. The first sentence is an exception and depends on the start label alone. The core of your essay should be the body paragraphs with the main and supporting points. Each main point should be well-developed, i.e. you should have 3-4 “supporting point” sentences per body paragraph. The average length of a body paragraph should be roughly 70-80 words. Together with the shorter introduction and conclusion paragraphs, the total length of the essay may exceed 300 words. It is feasible to write an essay of 300+ words within a time limit of about half an hour. Although essay length is excluded in the list of E-rater features, it is implicitly used to compute the degree of development of the main points.

108

4.3. How do you write an essay for E-rater? How does it work? The discourse classifier extracts discourserelevant features from a sentence; The set of features are combined in a vector and passed to the classifier. The output of the classifier is one of the six discourse labels. The types of features extracted include – • Sentence position within an essay • Sentence position within a paragraph • The paragraph number in which the sentence occurs • Cue words and terms (see Table 4.4) • Syntactic structures such as subordinate, complement, dependent, and infinitive structures These features are extracted from a set of training essays marked with discourse labels, to build a classifier. The classifier detects patterns such as – • A sentence that begins with a cue word like first or second is more likely to be a main point. On the other hand, the same cue word in the middle of a sentence is not as strong an indicator of a main point. • Most thesis statements occur in a single paragraph • When the number of main points is more than two, the conclusion is very likely (>90%) to follow the body paragraphs. However, if fewer than three main points were found in an essay, there is less than a 50% chance of finding a conclusion paragraph following the body paragraphs [31].

109

4. Automatic Essay Scoring

Table 4.4.: Sample Cue Words for Discourse Labels Discourse Label

Cue Words

Introduction / Thesis Conclusion

I feel / think / believe, In my opinion / view In conclusion / brief / short / summary, Briefly, To sum up By the way, now, incidentally First(ly), second(ly), third(ly), first of all, finally With reference to, however, it is true, while, on the other hand, anyway, at least, in the same way, although. despite, likewise, by the same token, however, further, in addition, for instance, on the whole, because

Other Main Point Supporting Points

Table 4.4 contains a sample set of discourse cue words for a discourse label. In many cases, the discourse cue word occurs near the beginning of a sentence. It is not necessary to use the words from Table 4.4 in your essay, but when these words are found in the right position and paragraph in your essay, your sentences are more likely to be correctly classified by discourse label.

4.3.6. Lexical Complexity The two features in this category measure vocabulary skill. The first feature is the average word length of the essay. For example, the average word length in two sample essays that were scored 2 and 6 were 4.7 and 4.9 respectively. The essays that score high tend to have higher average word lengths. Although it maybe difficult to think of longer words as you write your essay, you can try to avoid using a large number of

110

4.3. How do you write an essay for E-rater? short words. The use of inflected words and words from the SAT list (see Appendix C) in your essay can make the average word length closer to the average word length of a high-scoring essay. The second feature of lexical complexity is based on the standard frequency index (SFI) [35]. Every unique word is assigned a SFI value; Words that appear frequently in text have a higher SFI than words that are seen rarely (see Table 4.5). Unfortunately, the SFI value of a word does not distinguish between different meanings of the same word. For example, there is no distinction between the noun meaning and verb meaning of sound. If you do happen to use the less popular meaning of a word, you will not gain any additional benefit, since the SFI value includes all meanings. Table 4.5.: Twenty Words and SFI Values from Brown Corpus Word x

Frequency x

SFI x

Word y

Frequency y

SFI y

the

69971

148

of

36412

145

and

28853

144

that

10594

140

it

8760

139

when

2331

133

second

373

125

difference

148

120

heated

16

110

tolerated

6

105

underestimate

4

103

turf

3

100

predictably

2

97

preside

2

95

audition

3

93

urban-fringe

1

90

addressees

1

90

acidulous

1

88

interpretative

1

87

indecipherable

1

86

Function words like the and of have the highest SFI. Content words like underestimate and acidulous are seen less often and

111

4. Automatic Essay Scoring have a correspondingly lower SFI. The College Board created a corpus [35] of roughly 14 million words from a large sample of reading material that a high school or first year college student would have read. The corpus contained text from American and British novels, poetry, drama, essays, biographies, and various types of current periodicals. The samples used in the corpus were assumed to be a reasonably good representation of the vocabulary that a high school or first year college freshman would know. Every unique word in the College Board corpus was assigned a SFI value. Even though two words (acidulous and indecipherable) have the same frequency their SFI values are different. The SFI value of a word also depends on the size of the category in which the word appears. The word acidulous appears in the “Popular Lore” category (96K words) and the word indecipherable appears in the “Fiction: Romance” category (58K words). The word appearing in the smaller category is assigned a lower SFI value. The number of categories in which the word appears also plays a role in the SFI value: A word which appears in many categories will be assigned a higher SFI value than a word appearing in a few categories. Figure 4.4 is a plot of the top 1000 words sorted in descending order of SFI from the Brown Corpus vs. the frequency of the word in the corpus. The most frequent words in Figure 4.4. have the highest SFI value, however, the frequency of words drops more sharply than the corresponding SFI values (the x-axis uses a log scale). Ideally, you would like to use words with low SFI values in your essay. Your essay should also include a relatively high ratio of content words to function words (also called lexical density). Roughly 40% of the words in your essay should be

112

4.3. How do you write an essay for E-rater?

Figure 4.4.: Top 1000 Words from the Brown Corpus sorted by SFI vs. Word Frequency SFI Frequency

70K

120

60K

100

50K

80

40K

60

30K

40

20K

20

10K

Frequency

SFI

140

0 1

10

100

1000

Words

content words. The number of function words in spoken text is usually higher than in written text. In the Brown corpus of 1 million words, about 57% of the words were content words, 42% were function words, and the remainder were punctuation marks or other characters such as a parentheses. It can be difficult to consciously coin low SFI words; Instead, you can avoid using common words with high SFI values.

4.3.7. Prompt-Specific Vocabulary Usage The two features in this category compare the vocabulary of the given essay with sets of training essays. A group of training

113

4. Automatic Essay Scoring essays are manually pre-scored for each of the six score points. The two features are – • The similarity between the given essay and training essays at the score point 6 (the highest score point). • The score point (a number from 1-6) of the training essays with the maximum similarity to the given essay. The similarity computation is based on the cosine similarity measure [36]. Every essay must first be converted to a vector before computing a similarity. A vector consists of a set of unique words, each word with an associated weight. The weight of a word reflects its importance in the vector. In general, a word that occurs more often in an essay will be assigned a higher weight. However, the weight also depends on the frequency across score categories. Consider a word far in a given essay. First, the number of occurrences (Ff ar ) in the essay is computed. Next, the number of occurrences ( M axF ) of the most frequent word in the essay is computed. Finally, the number of training essays (N ) across all score categories and the number of training essays (Nf ar ) containing the word far are computed. The weight of the word far , in the given essay is – Wf ar =

N Ff ar × log( ) M axF Nf ar

Say, the word far occurs four times and the word the has a maximum frequency of 15 in the essay. Assume a total of 30 training essays of which six contain the word far , Wf ar is – Wf ar =

114

4 30 × log( ) = 0.186 15 6

4.3. How do you write an essay for E-rater? Consider a word immense that was found eight times in the same essay and in just two of the 30 training essays. The weight Wimmense is – 30 8 × log( ) = 0.627 15 2 The weight of immense is several times higher than the weight of far for the same essay since it was used more frequently and found in fewer training essays. E-rater constructs a vector of weights from n such words found in the given essay. Similarly, a vector is constructed from the training essays for a particular score category. Consider a score category 4; All the training essays that were scored 4 are combined into a single essay and weights for the words are computed as before. This is repeated for each of the remaining five categories. After the computation of weights is complete, we have six weight vectors, one for each score category. If your essay contains words that were seen often in high scoring training essays alone, then the similarity of your essays with a higher score category will be greater than with other score categories. The use of prompt specific vocabulary in your essay implies a higher value for the two features in this category, which in turn is more likely to increase your overall score. Therefore, it make sense to choose the most familiar prompt, if you are given a choice of prompts. Wimmense =

4.3.8. E-rater Writing Tips Extracting Sentences: Before any type of evaluation, E-rater first extracts sentences from your essay. You need to write your essay such that sentence boundaries are detected very precisely. In other words, a legitimate sentence in your essay should not

115

4. Automatic Essay Scoring be split in two and conversely, two sentences should not be combined into a single sentence. You can make it easy for the sentence extractor to find sentences, by ending every sentence with a sentence separator character (., ?, or !) followed by a space and a word that starts with a upper case letter or a new line character; a new line character following a sentence separator character indicates a new paragraph. A sentence that ends with an abbreviation may not be detected. For example, the sentence extractor may use the last period in the following sentence to build the abbrevation and miss the sentence separator. The responsibility to distribute a vaccine for the swine flu lies with the W.H.O. It is preferrable to re-write the sentence such that the abbreviation is not the last word. The correct use of sentence separator characters is a simple but important part of writing for an AES. If a sentence is not correctly detected, you are likely to be penalized for a grammatical error such as a run-on sentence. Therefore, end all sentences in an essay such that the extractor will certainly identify a sentence boundary. Fully Developed Arguments: It is more important to fully develop an argument than to use the best possible argument. Even a weak argument can be fully developed; explain the pros and cons of the argument and present a balanced view. E-rater evaluates the length of the supporting argument for every main point. Formal Writing: Informal language, slang expressions, or jargon (that could be unfamiliar to an English teacher) in your

116

4.3. How do you write an essay for E-rater? essay maybe penalized. However, you should use terms that are somewhat uncommon to display “lexical sophistication”. The use of a variety of different words in your essay instead of a few words that are repeated will indicate a broad “range of expression”. Picking a Topic: This is an obvious one; If given a choice of topics, pick a familiar topic. E-rater compares specific words from your essay with a high scoring essay on the same topic. So, if you are familiar with the jargon or terminology of a topic x, then your essay is more likely to have some overlap with pre-scored essays on x. The degree of overlap with pre-scored essays is an E-rater feature and contributes to your overall score. Opening and Closing: The first and last sentences of a paragraph can leave a positive impression on the reader. These sentences summarize the ideas or views expressed in the paragraph and therefore maybe longer than the average sentence. The last word of a sentence is also important. A reader will briefly pause at the end of a sentence and may keep the last word in mind before moving on to the next sentence. Although E-rater will not be impressed by such sentences in the same way as a human would be impressed, these sentences will be more likely to be correctly classified by discourse label. The absence of a proper introduction, conclusion, or main point will be penalized. Miscellaneous: These tips are just guidelines to fit the requirements of E-rater. If your essay is based on these tips, you will make it easy for E-rater to find the features necessary to

117

4. Automatic Essay Scoring compute a score. Although none of these tips mention content, it is important to plan ahead and build good supporting arguments in the form of examples. E-rater cannot distinguish between the quality of examples, since that would require a large body of background knowledge. However, a human grader can recognize and appreciate a good example, while E-rater will merely identify an example. You can also make it easy for the E-rater discourse classifier to precisely classify sentences by discourse label, if you use common discourse markers at appropriate positions in your essay.

4.4. Emustru Essay Evaluator The Emustru essay evaluator is based on the design explained in Section 4.1. You can enter an essay in a textbox of a Webbased form and submit the essay for evaluation. Click on the “Essay Evaluation” option of the upper menu bar from the home page to submit your essay. Emustru extracts a number of features and computes a score based on the values of each feature. The list of features used in Emustru include – • Total number of words • Total number of characters • Number of unique words • Fourth root of the number of words • Number of spelling errors • Number of grammatical errors

118

4.4. Emustru Essay Evaluator • Number of paragraphs • Number of sentences • Average word length • Number of unique words per 100 words • Average sentence length in words • Number of words with more than 5 letters • Number of words with more than 6 letters • Number of words with more than 7 letters • Number of words with more than 8 letters • Number of passive voice sentences • Standard deviation of word lengths • Standard deviation of sentence lengths • Number of discourse markers • Average coherence between sentences and the entire text • Average coherence between paragraphs and the constituent sentences • Average coherence between consecutive sentences The collection of 22 features is extracted in a vector and passed to a classifier model. The classifier returns the closest category for the passed vector. The evaluator returns a tabbed screen of results shown in Figure 4.5.

119

4. Automatic Essay Scoring

Figure 4.5.: Evaluation of an Essay in Emustru

The first tab is a summary screen showing some of the scores of the essay compared to a high scoring essay called Brown. A few features including the score, the number of grammatical errors, the number of spelling errors, and vocabulary are shown in the summary. All values have been scaled to a 0-100 range. The next tab contains a list of grammatical errors shown per sentence. The first sentence in Figure 4.6 contains a grammatical error. The statistical grammar checker detected an adjective (proud) following a noun (country); such an occurrence is very rare in English. The next sentence did not contain any grammatical error that could be detected.

120

4.4. Emustru Essay Evaluator

Figure 4.6.: The First Two Sentences of an Essay in the Grammar Tab

The spelling tab shows the list of sentences in the essay along with any spelling errors in each of the sentences. For every spelling error, a potential suggestion is also shown. The vocabulary tab shows a few of the word-statistics such as the number of words, average word length, number of unique words, and the standard deviation of the word length. The organization tab shows the coherence between individual sentences, sentences and their parent paragraph, and sentences with the essay text as whole. Other statistics include the counts of the use of passive voice and discourse markers. Notice, a higher count of passive voice markers may lead to a lower score, while a higher count of discourse marker is usually associated with a high scoring essay. The final tab contains a list of all the features in the essay compared to an ideal high scoring essay (Figure 4.7). The score assigned to the essay (5 in this case) is shown compared to an ideal essay. The number of grammatical errors and the category are also shown in Figure 4.7. The remaining 19 features are not shown in the figure. Any value that is not

121

4. Automatic Essay Scoring

Figure 4.7.: Three Attributes from the Full Evaluation of an Essay.

reasonably close to the ideal value is shown highlighted in the results. For example, the number of grammatical errors has been highlighted in Figure 4.7 since it is double the number of grammatical errors found in an ideal essay. Note, the score you receive from the Emustru essay evaluator may differ from the score computed by another AES. The Emustru evaluator has been trained on a small set of 100 training essays. Content: A machine grader will not judge the accuracy of facts in your essay, but a human grader may take issue with open-ended statements that cannot be supported or seem to be dishonest. Your sentences will be more acceptable if you use phrases like – “It is possible that ...” than “It is true that ...”. A human grader may think of a scenario where the statement is false which will make your essay less credible. Consider the following sentence – The volcanic eruption of Mount Kilauea was the reason the Solar Power Plant at Hawaii fell short of its goal. This sentence implies that the sole cause of the failure was due to the volcanic eruption and an astute reader may reason

122

4.5. Web sites to learn Essay Writing that there are other possibilities to explain the failure. If you qualify your sentences, they are less likely to be viewed with suspicion. A possible reason why the Solar Power Plant at Hawaii fell short of its goal is the volcanic eruption of Mount Kilauea.

4.5. Web sites to learn Essay Writing 1. http://criterion.ets.org: The Criterion Online Essay Evaluation Service (from the Educational Testing Service). Criterion is also used in E-rater. 2. http://www.knowledge-technologies.com: Pearson Knowledge Technologies’ Intelligent Essay Assessor™. 3. http://www.vantagelearning.com: Vantage Learning’s IntelliMetric® automated essay scoring system. 4. http://echo.edres.org:8080/betsy: A Bayesian Essay Test Scoring sYstem. 5. http://emustru.sf.net: Submit and evaluate a short essay of 200-400 words with open source Emustru.

123

4. Automatic Essay Scoring

124

5. Other Topics This chapter covers other topics that are part of standardized tests such as listening, comprehension, and speaking. A large number of commercial software products convert text to speech and vice versa. The quality of these products varies and this chapter does not evaluate commercial software.

5.1. Listening A simple way to learn from audio is to read a transcript while listening to the audio version of the same transcript. A large number of books from the Project Gutenberg [6] are available in both MP3 and text formats. Espeak [?] is an open source speech synthesizer for English that runs on the Linux and Windows platforms. Although the Espeak’s audio output does not sound as natural as a human’s voice, the quality is good enough to follow. Several voices are included – a default English voice, an U.S. voice, and a Scottish voice. On the Windows platform, Microsoft Sam (Speech Articulation Module) is a default voice. There are two ways to run Espeak: either from a GUI (see Figure 5.1) or the command line. The GUI can read text files and provides options to change the reading speed, the voice, and other controls. It is easy to use and on the Windows

125

5. Other Topics platform uses the Speech Application Programming Interface or SAPI. Figure 5.1.: Espeak Graphical User Interface

The command line version of Espeak generates audio from text that you can enter with a console or from a text file. espeak “Hello World“ espeak -f textfile.txt Espeak includes several options to tune the audio to your requirements: you can adjust the speed, pitch, and volume of the audio. The default speed of Espeak is 170 words per minute. You can set a speed in the range of 80 to 390 words per minute. If you need to save the audio, you can create a .wav file from the results of the text conversion.

126

5.2. Speaking The IELTS and the Pearson Test of English exams include Listening sections, where you will be required to listen to an audio and answer a few questions. The simple questions will ask you to repeat a sentence that you just heard. The more familiar you are with the voice and accent of the audio, the easier it will be to answer such questions. The harder questions may ask you to summarize a recording or to answer multiplechoice questions based on a recording that you just heard.

5.2. Speaking Speech recognition software allows you to control and dictate text to your computer through voice commands. The first attempts to build automatic speech recognition (ASR) software were not entirely successful. The problems of recognizing various accents and converting speech to text in real time were harder than expected. The latter problem was solved with the rapid increase in the computing power of PCs and improved software. However, most speech recognition software still uses two components - one for training and another for recognition. You will need to spend some time training your speech recognition software to become familiar with your accent; The training may require you to read long chunks of text. If your ASR software has been sufficiently trained, then the recognition software will have reasonably high precision. ASR software is complex and you can find out a lot more about it on the Web. The Sphinx project at Carnegie Mellon University is a popular open source tool for ASR. It has been used for several years, but needs some technical knowledge to train and test speech recognition.

127

5. Other Topics

5.3. Comprehension Passage comprehension is considered one of the trickier sections of a language exam. The reasons are – the topic of a passage maybe unfamiliar and consequently harder to comprehend, you are required to read and understand a passage within a time limit, and finally you may not know some of the passage vocabulary. Although a passage is not the same as the five-paragraph essay discussed in Chapter 4, you can use the same analysis techniques to study a passage. The initial description of the passage will explain the context – the passage will usually be an extract from a novel, a scientific article, or an essay. The first paragraph will establish the characters or the topic that will feature in the remainder of the passage. As you browse the passage, you will find sentences where the author uses the discourse words mentioned in Section 4.3.5. Often, passage questions will test if you understood the meaning of sentences that contain words like – despite, while, or however. Other words that maybe worth highlighting, include the names of people, places, and things. These words describe the entities mentioned in the passage. The adjectives used in the passage are also likely to indicate the tone of the passage. A question on the author’s views or attitude is a fairly common question in a long passage. Since exams like the SAT or GRE test aptitude, the subject matter of the passage maybe taken from a broad range of topics. The subject of the passage may include a scientific discussion (from physics, chemistry, botany, mathematics, zoology), a social commentary (philosophy, culture, history, geography), or a critique of the arts (drama, music, literature, sculpture,

128

5.3. Comprehension painting). It is difficult to be familiar with all these subjects and you should not be surprised to find the topic of a passage novel. However, test writers are careful to assume a generic background and will not ask questions that require any special expertise. Yet, you will need a strong vocabulary, even if you are not required to know the jargon of any particular topic. If you have studied the sciences and have very little knowledge of theater, then a passage about drama can appear daunting. One solution to this problem is to practice reading passages on unfamiliar subjects. This would prepare you to read a passage on a totally new subject without becoming overwhelmed.

5.3.1. Requirements Before you begin reading long passages, you should first build your vocabulary. If you do not know the meaning of 5 or more words in a passage of 150-200 words, you will find it difficult to answer some questions. There is always the possibility that you will not know the meaning of a few words in a passage. However, using the context and your knowledge of roots, prefixes, and suffixes (see Appendix C), you can make a reasonable guess that should be close enough to help you answer a question. Many passage questions test for the less frequent meaning of a word. For example, the word mold could be used as a verb (to shape or form) or a noun (a decaying surface or a pattern). The meaning of the word will depend on the context and you will be able to answer these types of questions, if you are familiar with most of the meanings of a word.

129

5. Other Topics The second requirement is that you should be able to complete most of the sentence completion questions (see Section 3.3.1) with very few errors. These questions have one or two missing words and are fairly easy compared to passage questions. They also test your knowledge of the meaning of individual sentences that are between 15-25 words long. Many passages will include sentences in this range and you should be able to decipher the meaning of such sentences without too much difficulty. The third requirement is that you should be acquainted with different writing styles. A scientific article is usually factual describing some phenomenon, procedure, or theory. An editorial article from a newspaper is written with the intention of persuading the readers to accept the author’s opinion on a known topic. Finally, fiction maybe highly personal or dramatic with several characters featuring in the passage.

5.3.2. Tips Before a passage begins, a blurb will describe the source. For example, the start of a passage about the traits of a conductor may state – In this excerpt from the “Joy of Music”, the conductor and composer Leonard Bernstein distinguishes the great from the average conductor. Although, the blurb is not part of the passage, it is important to read it carefully and recognize the background of the passage. Personal Opinions The answers to questions should always be found in the passage; you should not let your personal opinion influence your answer. There will be passages where you

130

5.3. Comprehension may disagree with an author’s opinion, but your answer should still reflect the contents of the passage and not your beliefs. Should you read the passage or questions first? It depends on what works for you. Some find it easier to read the passage first and then answer the questions. Others read the questions first and then search for answers in the passage. You may not have to read the entire passage to answer some questions. The questions are sometimes ordered based on the passage, i.e. the answers to early questions can be found in the beginning of the passage. The main purpose of reading the questions first is to save time. You do not gain anything by understanding sentences from the passage that are not relevant to any of the questions. Consider a 600-word passage with 16 questions with an allotted time of 16 minutes. If you can read at roughly 150 words per minute, it will take four minutes to read the passage. You then have just 12 minutes to answer the 16 questions; therefore, time is critical in a long passage. How do I read fast? Speed reading is a technique to read a passage faster without reducing comprehension. There are a number of suggestions on the Web to read faster and some of these methods may work for you. The questions that are not specific to a particular section or line number of the passage can take a lot of time: Such questions may require you to scan the passage for particular types of terms or keywords. If you find that you are spending a lot of time repeatedly scanning a passage for an answer, it may be worthwhile to skip the question and try it later.

131

5. Other Topics Which sentences are important? This may seem obvious, but you should read the sentences which answer the questions more carefully than the rest of the passage. Fortunately, some of these sentences are not difficult to detect. The questions with line numbers refer to specific sentences in the passage and you should read those sentences more closely than other sentences. If the sentence is not clear, you can read the sentence before and after to clarify the meaning. Global questions that require you to summarize or identify the tone of the passage may not refer to any particular sentence. However, it pays to read the first and last sentence of every paragraph that may give the gist of the passage. Reading long passages can be intimidating for two reasons – • You may begin to feel anxious since you will not answer any questions for 4-5 minutes, while you read the passage. • Remembering all the contents of a 800-word passage is hard. The trick is to focus on the “answer” sentences alone and skim the rest of the passage. It is not necessary to understand the entire passage thoroughly; roughly half the questions may refer to specific sentences in the passage. You should focus on those “answer” sentences and keep in mind the tone of the whole passage. How do I find the right answer? ETS is careful to choose answers such that the correct answer is the best possible choice. If you find that no particular answer stands out as the answer, you will have to use the process of elimination to find the most likely answer. Here, your vocabulary will help you. For example, a question regarding the tone of the passage will contain

132

5.4. Web sites to practice Reading Comprehension several adjectives as answers. You will have to differentiate between the strong, weak, and neutral adjectives to find the right word that summarizes the tone of the passage.

5.4. Web sites to practice Reading Comprehension 1. http://www.majortests.com/sat: Passages to practice for the SAT Reading Comprehension Sections. 2. http://www.testpreppractice.net/sat: More passages to practice for the SAT Reading Comprehension Sections. 3. http://www.ehow.com/topic_916_taking-the-sat.html: Tips for taking the SAT 4. http://www.testpreppractice.net/GRE: Practice tests for the GRE

133

5. Other Topics

134

A. Installing Emustru The Emustru software used in this book is available from http://emustru.sf.net. Emustru has been tested on the Windows and Linux platforms. The application is Web-based and runs on the Linux-Apache-MySQL-PhP (LAMP) or the Windows-Apache-MySQL-PhP (WAMP) stacks.

Windows This document will assume you have an existing stack on either the Windows or Linux platforms. The WAMP project (http://www.wampserver.com/en) distributes the three components of the stack - Apache, MySQL, and PhP. The WAMP distribution makes it simple to install the stack without downloading and customizing each of the individual components (see Figure A.1). Apache and MySQL run as services and must be started before installing Emustru. The Administrative Tools of the Control Panel includes options to enable these services at startup time. The default directory for WAMP is c:\wamp and the www sub-directory under this directory is the location for Web projects. The Emustru distribution can be unzipped in the c:/wamp/www directory. A default index.php file is created in the www directory and can be viewed from the browser at the URL, http://localhost/index.php.

135

A. Installing Emustru

Figure A.1.: Configuring Apache, MySQL, and PhP with WAMP

Initially the MySQL root userid may be created without a password. This is a potential security problem and you can set a password for the root userid from the command line with the following commands. C :\ wamp \ bin \ mysql \ mysql5 .0.51 b \ bin > mysql mysql > update mysql . user set Password = PASSWORD ( ’ MyNewPass ’) where User = ’ root ’; > flush privileges ;

Replace MyNewPass with a password for the root userid. WAMP includes the phpmyadmin tool under the apps directory to manage the MySQL database tables. This is a very

136

useful tool to troubleshoot problems with database tables and is fairly easy to use. The root MySQL password must be set in the config.inc.php file under the phpmyadmin directory. $cfg [ ’ Servers ’][ $i ][ ’ user ’] = ’ root ’; $cfg [ ’ Servers ’][ $i ][ ’ password ’] = ’ MyNewPass ’;

You can verify your installation from http://localhost/inde x.php. If both, MySQL and Php appear to be working, unzip the Emustru distribution in the WAMP www directory. Then, open a browser session at http://localhost/emustru and continue as shown in Configuration section.

Linux Many of the current Linux distributions include options to install a Web server (Apache), a database server (MySQL) and PhP. If you have not installed these components, then you can either install a separate package XAMPP, use the distribution to add these components, or download each of the options separately. The XAMPP (http://www.apachefriends.org/en/xamp p.html) project is a multi-platform tool to build the AMP stack on Linux, Windows, MacOS, and Solaris platforms. It includes the same components as WAMP and a few others as well. On Linux, the XAMPP distribution is a gzipped tar file, that can be unzipped in an /opt directory. You will need to become the root user to complete the rest of the installation. After unzipping the distribution, you can start Apache and MySQL with the "lampp start" command from the top level

137

A. Installing Emustru installation directory. This command will start Apache and MySQL if existing servers are not running on the same ports. Before starting XAMPP, you should stop any existing Web or database server to avoid conflicts. The /etc/init.d directory may contain the scripts to start and stop other Web and database servers. If you decide to make the XAMPP installation override any existing Apache and MySQL installation, you can modify the startup and shutdown scripts to start both servers from the XAMPP directory alone. There are several security problems that need to be fixed before running the servers. The command “lampp security” sets passwords to access the Web pages, the MySQL database, a FTP server, and the Phpmyadmin tool. The script will also limit network access to the MySQL server by modifying the my.cnf file in the etc directory. The root Web directory is the found in the htdocs directory under the installation directory. Emustru should be unzipped in this directory during installation. The installation can be verified by starting a browser session pointing to the URL http://localhost. You should see a page with an orange background and a number of menu options.

Configuration The screen shown in Figure A.2 should appear, if the Emustru distribution has been unzipped under the htdocs directory, from a browser session with the URL set to http://localhost/ emustru/index.php. This screen is common to Linux and Windows installations. The installation screen in Figure A.2 is based on a Windows installation. A Linux installation is similar with the exception

138

Figure A.2.: Emustru Installation Screen

of the mandatory entries for the Web root directory, MySQL root directory and Java Runtime directory. The MySQL userid should have the authority to create a database and load tables. The mysqlimport utility found in the bin directory under the root directory of the MySQL server, is used as a backup if the load table command fails. In Windows, the java executable is found from the environment variables that are set when Java is installed. However in Linux, the java executable may not be found in the PATH variable and therefore the runtime directory (i.e. the directory above the bin directory) may be required to run Java code. During the installation, about 30 database tables are loaded and two configuration files - config.php and config.prp for

139

A. Installing Emustru PhP and Java respectively, are created. The configuration files should be made read-only after a successful installation since these files contain a userid and password for the MySQL server. Both configuration files are first created in a temp directory. In Windows, the temp directory may be c:/WINDOWS/TEMP/emustru or C:/WINNT/TEMP/emustru and in Linux it may be /tmp/emustru. The temporary files are copied to the Web root installation directory. In Linux, this is usually a problem, since the Web user (such as nobody, www, or apache) does not have the authority to create files in the Web directories. The Linux installation may end with a message like • The config.php could not be moved to the /opt/lampp/htdocs/emustru directory because of permissions. • To complete the installation, you will need to copy the files – /tmp/emustru/config_temp.php to /opt/lampp/htdocs/emustru/config.php and

– /tmp/emustru/config.prp to /opt/lampp/htdocs/emustru/java/data/config/config.prp

This installation assumes that MySQL and Apache have been installed under the /opt/lampp directory. After copying the configuration files to the /opt/lampp directories, the login page for Emustru will be shown.

Customization The default distribution comes with a word list of about 8,000 words and 6,500 sentences. Two additional sources of sen-

140

tences and words can be downloaded from SourceForge.net brown.zip and sat.zip. The brown.zip file contains 25,000 words and 35,000 sentences from the Brown corpus [5]. The sat.zip file contains 8,500 words and 120,000 sentences extracted from e-books downloaded from the Project Gutenberg [6]. Unzip both of these files in the install/table_data directory of the installation directory. Then login as admin (initial password admin) and press the “Load Word Table” button shown in Figure A.3. Figure A.3.: Adding Words and Sentences to Emustru

141

A. Installing Emustru You can also add a list of words to one of the word list types. Emustru will accept a file with one word per line in several formats. An optional number accompanying the word is interpreted as a rank and words with higher ranks will be shown earlier in quizzes than other words in a generated quiz. If no ranks are provided, all words are assigned the same rank and a rank order quiz for such a word list, will fetch words in alphabetic order. The words for questions in any quiz can be selected at random or by rank order. An option to select an order type for the quiz is provided before a quiz is generated.

Troubleshooting The Java code uses a JDBC connector to access the MySQL database and will not function if network access is disabled in MySQL. Network access is set through the skip-networking option in the my.ini file. During a fresh installation, you may need to clear out any existing log and configuration files from the temporary directory. • There are several log files that contain messages indicating problems with the installation or running of Emustru. – The emustru.log file in the temporary directory contains messages from problems found in the PhP or Java code. – Entries in the Apache error log file, the MySQL log file, and a PhP error log file may contain useful information to debug a problem. • The essay evaluate function starts a shell script from PhP to run the Java code and can be found in the temporary directory.

142

• Similarly, the other Java functions are run from PhP using the shell_exec command which may not work if PhP is operating in safe mode. • Finally, directory permissions in Linux are often a source of installation problems. Permissions and files left over from a previous installation may cause problems in a new installation since some files cannot be removed.

143

A. Installing Emustru

144

B. Parts of Speech Identifying the part of speech of a word will make it easier to understand the meaning of the word as well as its context in a sentence. The nine common parts of speech are - noun, pronoun, verb, adjective, adverb, conjunction, determiner, interjection, and preposition. (see Figures B.1 and B.2). It is important to know parts of speech, not just to build grammatically correct sentences, but also to learn vocabulary, comprehend a passage, and score well in a Cloze test. You can practise your skills in finding the part of speech of words in a sentence, using the Link Parser [18]. Figure B.1.: Nouns, Pronouns, and Determiners. Noun Determiner at, the, a

Pronoun Person: Tom Place:

Vienna

Thing:

Piano

he, they, us, him

Nouns, Pronouns, and Determiners A noun defines some type of entity such as a person, place, or thing, while a determiner refers to a noun. A pronoun is a substitute for a noun and must unambiguously refer to a noun in a sentence.

145

B. Parts of Speech A common error is the use of two nouns and a single pronoun in a sentence. For example, the use of the word He in the second sentence below is ambiguous, since there are two nouns in the first sentence and he could potentially refer to either Amundsen or Scott. In 1911, Amundsen reached the South Pole before Scott arrived. He had told people that he was going to sail for the Antarctic. Conjunctions and Prepositions A conjunction combines two clauses into a single sentence (see Figure B.2). The first clause is usually the main or independent clause and the second clause is the subordinate or dependent clause. While the main clause can be a separate and meaningful sentence by itself, the subordinate clause needs the main clause to make sense. Other types of conjunctions include compound conjunctions (as long as) and correlative conjunctions (although). A preposition describes a relationship (in space and time) between a set of words in a sentence. For example, in Figure B.2, the preposition below is used to locate the floor. Similarly, prepositions such as before or after specify the time of an event in a sentence. Prepositions often occur in phrases such as at home or under the warm blanket. A prepositional phrase functions as an adjective or adverb. In the sentence - “The book on the floor is wet.”, the prepositional phrase “on the floor” modifies the noun book. A prepositional phrase can appear in the middle or the beginning of a sentence. Verbs, adjectives, and adverbs A verb is perhaps the most important part of a sentence. It asserts something about a

146

Figure B.2.: Conjunctions and Prepositions Sentence

Call the

IF

taxi c

Tags verb determiner noun

o n

j

you

t unc

io

are

ready .

n pronoun verb

Main clause

adjective

Subordinate clause

Sentence

The floor

is

BELOW

us.

p r

Tags determiner noun verb

e

p o

pronoun

s i

t i o

n

noun (the subject of the sentence), such as the state, a relationship, or a comparison (see Figure B.3). A verb may consist of more than one word; for example, in the sentence, “The birds were singing.”, were is an auxiliary verb and singing is an ordinary verb. The purpose of an adjective is to add to the meaning of a noun. An adjective may describe the kind or quantity of a noun. For example, the sentence, “The small boat floundered in the vast ocean.”, contains two adjectives - small (kind) and vast (quality). An adverb modifies a verb in the same way an adjective modifies a noun. For example, the verb run in the sentence, “Tom runs quickly.”, is modified by the adverb quickly. A

147

B. Parts of Speech

Figure B.3.: Sample Action Verbs Combine

Explain

List

Communicate

Describe

Analyze

Action Verb Identify

Generate

Refer

Create

Perform

Speculate

simple rule of thumb to detect an adverb in a sentence is to check for the suffix ly in the word. Roughly 66% of the 4400 adverbs in the WordNet dictionary, end with the suffix ly, and over 77% of the words that end with suffix ly, are adverbs. Punctuation Not strictly considered a part of speech, punctuation is the use of a set of characters in sentences to help the reader understand an intended meaning, with as little effort as possible. The set of punctuation characters includes the apostrophe (’), the comma (,), the period (.), the semi-colon (;), the colon (:), and the dash or hyphen (-). Many of these characters serve more than one purpose. What follows is a brief explanation of these characters; several references [15, 21] explain punctuation and its use in more detail. Apostrophe The most common use of the apostrophe is to show possession - the boy’s bat or the boys’ bats. The apostrophe also appears in contractions. The two word phrase, “Who is”, is replaced with “Who’s”. Similarly, the phrase, “It is”,

148

is replaced with “It’s”. Other popular contractions include haven’t, couldn’t, and you’re. Comma The comma, like the apostrophe, can function in more than one way. A comma organizes words in a sentence into groups of words, separates items in a list, and indicates a pause in a sentence (see Figure B.3). The commas in the sentence, “The primary colors are red, blue, and yellow.”, separate the three primary colors. The second comma following blue is optional, but its inclusion can make a sentence more readable. The following four sentences illustrate other usages of the comma.

,

I knew that the price of gold would increase but I had no idea that the price would skyrocket.

,

I first checked if the power supply was defective and then I disassembled the computer.

,

Although she is a good student Jane barely passed Calculus. Mr. Johnson, who is the head of development will present the awards at our annual dinner.

,

Notice, in the first two sentences a comma followed by a conjunction separates two complete sentences. The sentences on either side of the conjunction are complete and could be separated with a period instead of a conjunction and a comma. A longer sentence may be preferred over two short sentences, when there is some relationship between the two sentences. The introduction in the third sentence prefaces the second part of the sentence. In the last sentence, the two commas are positions in the sentence where the reader should pause to interpret the meaning of the sentence.

149

B. Parts of Speech Semi-colon and Colon The semi-colon and colon punctuation characters are sentence separators. A semi-colon separates a sentence into two parts - the first part explains part of a story and the second part completes the remainder. A period between these two sentences would introduce a stronger separator than necessary, while a comma would not be sufficient (see Figure B.4). Figure B.4.: Punctuation Characters that Indicate Pauses in a Sentence Punctuation Character

? !

Pause Time

. : ; ,

He did not spend any time studying even though he was far behind in his class he was going to fail his math test.

;

The use of the colon to separate two sentences is similar to the use of the semi-colon, with the following difference. The text before the colon introduces part of a sentence that is elaborated, restated, or explained in the following part of the sentence. Sherlock Holmes was left with one question unanswered Why did the thief leave his keys behind?

:

150

In general, the semi-colon is used more often than the colon to indicate a pause. It may be difficult to define the precise length of a pause in a sentence and lookup the appropriate punctuation character, based on the duration of the pause, for a given sentence.

151

B. Parts of Speech

152

C. Word Lists The PDF files below can be downloaded from http://emustru.sf.net. • http://emustru.sf.net/list_roots.pdf – List of Prefixes, Suffixes, and Roots: This list contains some of the common roots, prefixes, and suffixes that make up the building blocks of numerous English words. Each root, prefix, and suffix has the associated meaning and sample words. • http://emustru.sf.net/list_confused_words.pdf – List of Sentences for Confused Words: Words such as accept and except are sometimes used incorrectly in sentences. This list includes a set of sample sentences for every pair of confused words. • http://emustru.sf.net/list_misspelled_words.pdf – List of Misspelled Words: A collection of 2700 words that have been frequently misspelled. • http://emustru.sf.net/list_preposition_errors.pdf – List of Preposition Errors: A short list of common preposition errors • http://emustru.sf.net/list_sat_words.pdf – List of SAT Words: A list of 8600 words that appear often

153

C. Word Lists in the SAT exam. Each word is hyperlinked to a WordNet definition. • http://emustru.sf.net/list_words_sfi.pdf – List of 10K Words from Brown Corpus: A list of ten thousand words and their associated standard frequency index values from the Brown Corpus.

154

Index AES, see automated essay scoring argumentative prompt, 92 audio recording, 35 automated essay scoring, 81 automatic speech recognition, 127 average word length, 110 bigram, 72, 96 Brown corpus, 40, 69 building sentences, 58 CLAWS, 69 Cloze test, 14, 74 College Board, 112 Computer Assisted Language Learning, 3 confused words, 97 context, 30 coordinating conjunctions, 106 cosine similarity measure, 114 cue word, 109

discourse classifier, 107 element, 86 marker, 75 words, 59 E-rater, 16, 82, 93, 94, 96 edit distance, 42 Educational Testing Service, 16, 68, 81, 132 Emustru, 13, 15, 40 essay evaluator, 118 sentence quizzes, 73 spelling quiz, 45 Espeak, 125 essay content, 122 evaluation, 15 length, 93 model, 87 writing, 16 ETS, see Educational Testing Service

descriptive prompt, 92

155

Index example sentences, 60 expository prompt, 92 extracting sentences, 115 faulty comparison, 99 feature, 84 filter, 72 formal writing, 116 Free Rice, 6, 10, 37 FreeTTS, 13 function words, 63 grammar checker, 15, 64, 66, 96 ALEK, 68 E-rater, 70 LanguageTool, 67 parse tree-based method, 67 rule-based method, 67 grammar errors, 85, 95 great sentences, 57

LAMP, 135 letter errors, 100 Lewis, Norman, 5, 21 lexical complexity, 86, 110 listening, 125 logistic classifier, 88 long sentences, 62 machine grader, 63 main point, 109 mechanics, 99 New York Times, 12, 32, 34 O’Connor, Johnson, 22

hangman, 50 holistic score, 83 human grader, 16, 63, 81, 83, 87, 90, 94, 100 hyponym, 49

part of speech, 31 passage comprehension, 128 passive sentences, 105 personal opinions, 130 phrase game, 52 precise sentence, 64 preposition error, 99 Project Gutenberg, 11, 32, 125 pronunciation, 2, 4 proxes, 84 punctuation, 61, 65 punctuation errors, 101

Intelligent Essay Assessor, 90 Intellimetric, 90

quiz, 5, 21 Quizlet, 10, 38

Krugman, Paul, 12

sentence

156

Index completion, 75, 77 length, 62, 106 pattern, 59 structure, 71 SFI, see standard frequency index similarity computation, 114 speaking, 127 speed reading, 131 spell check, 42, 64 spelling, 42 error, 16, 43, 86, 100, 120 error analysis, 42 quiz, 13, 35, 37 Sphinx project, 127 standard frequency index, 111 style, 103 subject-verb agreement, 72 test writer, 26, 34, 48, 129 text to speech, 4 thesis statement, 109 trait, 84 trin, 84

WAMP, 135 Washington Post, 12, 32 Wikipedia, 43 word, 24, 27 errors, 101 form, 24 games, 50 jumbles, 33 lists, 1 meaning, 30, 48, 49 prefix, 28 relationships, 33, 53 roots, 28 suffix, 28 wrong form, 98 WordNet, 2, 11, 13, 26, 35, 48 words inappropriate, 104 repetitive, 103

United Nations World Food Program, 37 vector, 87, 115 Visual Thesaurus, 35, 36 vocabulary, 25

157

Index

158

Bibliography [1] http://en.wikipedia.org/wiki/Computer-assisted_ language_learning, Computer Assisted Language Learning (CALL). [2] http://www.camsoftpartners.co.uk/freestuff.htm, Free resources and articles on Computer Assisted Language Learning. [3] http://wordnet.princeton.edu, The WordNet lexical database for English. [4] http://ftp.ets.org/pub/res/erater_iaai03_burst ein.pdf, Criterion: Online essay evaluation: An application for automated evaluation of student essays. [5] http://en.wikipedia.org/wiki/Brown_Corpus, Brown Corpus.

The

[6] http://www.gutenberg.org/wiki/Main_Page, Project Gutenberg..

The

[7] http://www.quizlet.com, Quizlet: Flashcards, vocabulary memorization, and word games. [8] http://www.alias-i.com, The LingPipe Computational Linguistics software.

159

Bibliography [9] E. B. Page and N.S. Petersen: The computer move into essay grading. Upgrading the ancient test. Phi Delta Kappa, 76(7), 561-565. [10] J. Burstein, The E-rater Scoring Engine: Automated essay scoring with natural language processing, in Automated Essay Scoring: A Cross Disciplinary Perspective, Lawrence Erlbaum Associates, 2003, pp 113-121. [11] Y. Attali and J. Burstein: Automated Essay Scoring With e-rater® V.2, The Journal of Techonology Learning and Assessment, Vol. 4, No. 3, February 2006. [12] http://www.vantagelearning.com/school/products /intellimetric/, Intellimetric, Vantage Learning. [13] http://www.knowledge-technologies.com/prodIEA. shtml, Intelligent Essay Assessor, Prentice Hall. [14] C. E. Good, A Grammar Book for You and I (Oops, Me): All the Grammar You Need to Succeed in Life, Capital Books, Sterling, VA, March, 2002. [15] M. Strumpf and A. Douglas: The Grammar Bible: Everything You Always Wanted to Know About Grammar but Didn’t Know Whom to Ask, Holt Paperbacks, New York, NY, July, 2004. [16] http://freetts.sf.net, The FreeTTS text to speech synthesizer written in Java. [17] http://en.wikipedia.org/wiki/Cloze_test, The Cloze test or assessment with certain words removed from text.

160

Bibliography [18] http://www.link.cs.cmu.edu/link/submit-sentence4.html, The Link Parser from Carnegie Mellon University. [19] http://www.languagetool.org, The LanguageTool Open Source Language Checker from Daniel Naber. [20] C. Leacock and M. Chodorow: Automated Grammatical Error Detection, in Automated Essay Scoring: A Cross Disciplinary Perspective, Lawrence Erlbaum Associates, 2003, pp 113-121. [21] L. Truss: Eats, Shoots, and Leaves: The Zero Tolerance Approach to Punctuation, Profile Books, May 2007. [23] http://en.wikipedia.org/wiki/SAT, The SAT Reasoning Test administered by the Educational Testing Service. [24] J. Burstein: The E-rater Scoring Engine: Automated essay scoring with natural language processing, in Automated Essay Scoring: A Cross Disciplinary Perspective, Lawrence Erlbaum Associates, 2003, pp 113-121. [25] Y. Attali and J. Burstein: Automated Essay Scoring With e-rater® V.2, The Journal of Techonology Learning and Assessment, Vol. 4, No. 3, February 2006. [26] http://www.vantagelearning.com/school/products/ intellimetric, Intellimetric, Vantage Learning. [27] J. Burstein, M. Chodrow, C. Leacock: Criterion Online Essay Evaluation: An application for automated evaluation of student essays, Proceedings of the Fifteenth An-

161

Bibliography nual Conference on Innovative Applications of Artificial Intelligence, 2003. [28] J. Burstein, M. Chodrow, C. Leacock: Automated Essay Evaluation: The Criterion Online Writing Service, American Association for Artificial Intelligence, Fall 2004. [29] J. Burstein, D. Marcu, K. Knight: Finding the WRITE stuff: Automatic Identification of Discourse Structure in Student Essays, IEEE Intelligent Systems, January 2003. [30] http://www.scribd.com/doc/4018042: Handbook for English Language Learners.

Writer’s

[31] http://www.ets.org/Media/Research/pdf/r3.pdf : The Ups and Downs of Preposition Error Detection in ESL Writing, ETS, Princeton, NJ. 2008. [32] A. Longknife and K.D. Sullivan: The Art of Styling Sentences, Barron’s Educational Series, New York, 2002. [33] R. McCutcheon and J. Schaffer: Increase Your Score in 3 Minutes a Day SAT Essay, McGraw-Hill, New York, 2004. [34] http://en.wikipedia.org/wiki/Cosine_similarity: Cosine similarity measure. [35] H. M. Breland et al.: The College Board Vocabulary Study. College Board Report No. 94-4, College Board Publications, New York, 1994. [36] http://espeak.sourceforge.net: Espeak, an Open Source Speech Synthesizer for English.

162