Introducing NawaCoLex2

3 downloads 0 Views 9MB Size Report
These are used by NAWACOLEX to identify sentences and texts ...... click on it: through parallel movement, the main window (on the right) will displays that entry ...... anyone else over the past dozen years, and I can imagine that his field notes must have informed .... They also substituted a comma for a semicolon, perhaps ...
The Nawat Corpus & Lexicon Database

NAWACOLEX Version 2.1

TUTORIAL © Alan King 2014

TABLE OF CONTENTS INTRODUCTION THE NAWACOLEX DATABASE What is NAWACOLEX? What is NAWACOLEX for? How does NAWACOLEX work? What is in NAWACOLEX 2.1? What is not in NAWACOLEX 2.1? A read-only database

RESEARCH TOOLS NAWACOLEX’s research tools Concordances What is a concordance? Search parameters Choosing your text corpus Concordancing lexicons: Spanish to Nawat Concordancing lexicons: advanced searches

Sorting INSTALLATION INSTRUCTIONS Install the Toolbox software Set up the NAWACOLEX 2.1 database Opening NAWACOLEX for the first time Installation troubleshooting USING NAWACOLEX Introducing the NAWACOLEX screen Basic how-to’s How do I view a book? How can I browse a document? How can I display a lexicon? How can I see which lexicons have a word? What is the Wordlist? What can I do with it?

“Jumps” and parallel movements Wordlist to lexicon Lexicon to lexicon Text to lexicon Non-citation forms Variants and sub-entries Other jumps

Some Toolbox things... The windows Movable columns and fields Hidden fields Browse view Side browse The program bar and status bar Printing and exporting Margins Open windows and the “Window” menu

Sorting of lexicons Sorting of concordances

Filters THE LEXICONS Introduction Structure and content Adaptation and transcription Verb inflections Arauz BibLex Campbell Hernandez LBN Ramirez Schultze Todd THE TEXTS Introduction Structure and content Transcripts The corpus Masin Arauz Campbell Other texts APPENDIX Toolbox: Shortcut keys Toolbox: Advanced users BIBLIOGRAPHY

INTRODUCTION This tutorial teaches how to use the Nawat Corpus & Lexicon or NAWACOLEX database (v 2.1, January 2014), which contains the most important source texts in Nawat, transcribed and edited, and also essential content of the most significant Nawat lexical resources (i.e. dictionaries, vocabularies or glossaries). As a practical tool, NAWACOLEX systematizes and integrates these texts and lexical resources to make it easier to access the primary materials through an incorporated user interface. NAWACOLEX 2.1 can be downloaded with all these features and used on a computer to read, search, explore and study the linguistic material and the content. This is the first version of NAWACOLEX to be shared with advanced students and fellow researchers. The prior version of this material (NAWACOLEX [version 1], ~2008) and its predecessors (NAWATLEX and Nawat Corpus 1.0, both ~2004) were all intended for personal research and as prototypes, and not distributed. The upgrade to version 2.1, carried out in 2013, has involved reorganisation of materials, some new transcriptions, designing a more polished and user-friendly interface and the development of this tutorial prior to sharing this package with interested colleagues. Details about how to obtain NAWACOLEX 2.1, discussions and announcements of future upgrades may be found at the Seminario Lingüístico Náhuat (SLN) on Facebook. Information can also be requested personally from me at [email protected]. The following tutorial will tell you how to get started, explain how the database is organised and works, give basic operating instructions, and list the contents of the current text corpus and the lexicons incorporated. I take this opportunity to thank those who have helped with parts of the considerable work that has gone into transcribing materials that have a place in NAWACOLEX 2.1, and those who have provided such materials, and in particular I wish to acknowledge help received from the Ne Bibliaj Tik Nawat project (in the person of Jan Morrow) which has helped finance the last few months of work necessary to develop the present upgrade of NAWACOLEX, without which its public appearance would certainly have been delayed. This result should not be judged as a finished general publication. It would be closer to the mark to think of this as my personal notes, packaged and wrapped in a form that, although not as polished as a fully public release, will give fellow Nawat scholars access to these materials and will hopefully facilitate their use and study and encourage better acquaintance with the richness of the Nawat language, which is still unknown to, or ignored by, far too many people, and the study of which is essential for the language’s successful recovery, which is the hope that inspires this effort and to which it is humbly dedicated. Alan King December, 2013

THE NAWACOLEX DATABASE What is NAWACOLEX? The NAWACOLEX (Nawat Corpus & Lexicon) database is a database structure serving as a holder for the Nawat texts and lexicons it contains. These texts all ultimately come from native Nawat speakers; in some cases have been published previously and there are also some unpublished texts. Also included are some transcriptions of recorded interviews with native speakers from the IRIN documentation project. The older texts and lexicons have been re-spelled in the common, modern orthography, both for users’ convenience and to facilitate the efficient functioning of the database as a tool for linguistic study and research. The software employed is an application called Toolbox developed and freely distributed by SIL (the Summer Institute of Linguists). This application, together with the specific configuration developed for NAWACOLEX, offers design features facilitate a number of useful operations which will be described in detail below.

What is NAWACOLEX for? With the NAWACOLEX database you can do many things, such as the following, speedily and easily:  read Nawat texts, locating them rapidly from a single index and displaying them in a single text window.  display Nawat lexicons at a single click.  search for any word in the whole text corpus or a part of it, and create concordances.  see a wordlist, i.e. an alphabetical list of all the words that occur in the text corpus with information on the frequency of each word and a list of places where the word occurs, with the ability to view any of the locations listed.  quickly find any word in the corpus wordlist or in the lexicon of one’s choice, provided it is listed in that lexicon.  quickly look up in a lexicon any word that occurs in a text. There are also things that NAWACOLEX doesn’t do. For instance:  NAWACOLEX does not show the original formatting of texts or other elements contained in publications such as illustratrations, notes etc.; for all such things the original texts must be consulted.  The texts in NAWACOLEX have all been transcribed into modern spelling, which is usually an advantage for searches etc., but usually you cannot see the original spelling in NAWACOLEX, but must go to the original for that.

The NawaCoLex Database

5

 When you are reading a Nawat text, NAWACOLEX usually cannot provide you with its translation (however, you can look up words you don’t understand in its lexicons).

How does NAWACOLEX work? NAWACOLEX is built with an application called Field Linguist's Toolbox (http://www01.sil.org/computing/toolbox/) but usually referred to as Toolbox for short. Toolbox has been developed by the Summer Institute of Linguists (SIL) and is distributed free of charge. This is a flexible tool well adapted to a number of text/language-related areas including corpus development and lexical work. Toolbox is considered a database programme, but unlike standard databases it is specifically adapted to the needs of people working with two basic aspects of language work: text and vocabulary. Thus it is ideal for the present purpose. Toolbox treats a file containing a text as a “database” (though in this tutorial we will often call it a “document” or a “file” for clarity). Each such text file is divided into parts called “records”, but which we may also call pages, sections, chapters etc., and each record is further divided into lines (basically, sentences) and/or fields. Where the texts of the corpus are concerned, every file, every section/page and every line is assigned an identifying code. These are used by NAWACOLEX to identify sentences and texts internally. Thanks to them, NAWACOLEX can quickly find and display any sentence in the corpus, and it can tell us what sentences (in which texts, and where) contain any word. A lexicon also takes the form of a database. This kind is divided into records that we usually call “entries” (like a dictionary entry) for individual Nawat words, and each entry contains fields where different kinds of information has been inserted, such as the Nawat headword, the gloss (i.e. translation), and other information about the word. Toolbox can take an item in one lexicon and look for similar items in another lexicon (i.e. perform automatic cross-references). It can also take a word in a text and find its entry in a lexicon (i.e. look up words), but only if the word is in its citation form. Moreover, Toolbox can generate other kinds of database from existing ones. For example, from one or more text files it can create something which we call a wordlist; this is a file where all the words in those texts are listed in alphabetical order, together with an indication of how many times they occur in the texts and even a list of the places where they occur. Secondly, Toolbox can create another kind of list called a concordance, which tells us about the occurrences of particular words or phrases.

The NawaCoLex Database

6

What is in NAWACOLEX 2.1? For the in-depth study of a language such as Nawat, it is important to dispose of texts which provide reliable samples of the language as it is, or has been, in use. Since these are not very many or very voluminous, this amounts to trying to collect together just about everything that can be found, with emphasis on the best sources and the largest ones in order to possess as much good data as possible. Another thing much to be desired, given present technology, is to dispose of such texts in a format that is compatible with available computing tools that facilitate their systematic study. Over the ten years I have spent so far studying Nawat I have gradually built up such a computerized corpus, whose content has grown and which has been organised in different ways as my computer tools developed in sophistication. These texts have been transcribed in modern orthography because this makes their digital processing much more efficient. The textual content of NAWACOLEX 2.1 is essentially the kernel of that corpus as I have developed it up until the present time. During this period I have also accumulated and progressively transcribed and formatted Nawat lexical sources, and the most important of those materials constitute the lexical component of NAWACOLEX 2.1. The items contained in the current version are the following: Text corpus of NAWACOLEX 2.1  “Ynes Masin” corpus (texts in Schultze/Tajtaketza Pal Ijtzalku)  Próspero Aráuz (texts)  Lyle Campbell (texts from Cuisnahuat and Santo Domingo)  Derechos Humanos (translation edited by J. Lemus)  Genaro Ramírez (Naja Ni Genaro)  Genaro Ramírez (miscellaneous writings)  Paula López (stories and songs)  IRIN (transcriptions of eight interviews)

Lexicons in NAWACOLEX 2.1  Próspero Aráuz (vocabulary)  BibLex (the lexical database of the Nawat New Testament)  Lyle Campbell (vocabulary)  Werner Hernández (vocabulary)  Léxico Básico Náhuat (A. King)  Genaro Ramírez (a short vocabulary)  Leonhard Schultze (vocabulary)  Juan Todd (vocabulary)

What is not in NAWACOLEX 2.1? There is an argument for expanding NAWACOLEX to include just about everything that can be found in the way of source material. There are also one or two reasons for perhaps not doing so. One concerns the quality of sources: everything that is in Nawat (or claims to be) is not of equal value as linguistic data; some is good, and some is frankly poor quality, even confusing at times. Do we really want to include everything? On the other hand, as long as this material is not used by beginning language students or people without sufficient training, it can be argued that this is not really a problem, as any source ought to and can be used responsibly, in a way appropriate to its worth

The NawaCoLex Database

7

and reliability. Besides, we can use the technology to filter out some sources and only explore the texts we choose, when our purpose calls for it. But there is a more practical reason for not having included all available materials in NAWACOLEX 2.1, and that is the availability of my time. Preparing all these materials takes effort and dedication, so choices must be made. One such choice is about how much time to spend on one project as opposed to others; another kind of choice is, in a project such as NAWACOLEX, given limited time, which materials to process, or at least to prioritize and incorporate first. The object pursued has been to produce a body of texts and lexical data covering as much valid information as possible for practical study and research purposes. Some of the items that are not present in the current version of NAWACOLEX include:  Works of any type that are full of bad information (errors) and lacking in usable content.  Materials that do not add anything new to what can already be found in the existing corpus.  Items that would involve many hours of work to type up and prepare, especially if they are simply not good enough to be worth the effort. However, as time permits more materials will be incorporated in future editions of NAWACOLEX.

A read-only database From among the many options provided by the Toolbox application, I have chosen to distribute a copy of NAWACOLEX 2.1 which is configured so that users can consult the material but cannot modify the configuration. There are advantages to this for some users, as by not being able to modify it you cannot do anything wrong and corrupt it. Therefore, you are free to do what you like when you have NAWACOLEX open, and (for better or worse) when you close it everything will revert to the same state it was in the last time you opened it.

INSTALLATION INSTRUCTIONS Install the Toolbox software The first thing that needs to be done before NAWACOLEX itself can be installed is to install the application that it runs on: the Toolbox software, produced by the Summer Institute of Linguists (SIL) and distributed free of charge from their website: www-01.sil.org/computing/toolbox/downloads.htm Installation is not difficult and shouldn’t give any trouble. Just follow the simple instructions. This only needs to be done once; if you already have Toolbox on your computer, skip this and move straight on to the next step.

Set up the NAWACOLEX 2.1 database Next, you need to set up NAWACOLEX itself. This is also quite simple. It is contained in a zip file which you may already have by now, as this document will form a part of it. If you haven’t already done so, download the zip file and extract everything onto your hard disk. It doesn’t matter much where you locate the NAWACOLEX 2.1 folder (as long as you know where it is!). It would be a waste of space to put it on your Desktop as you will not normally need to access it directly, so this is not recommended. You might want to place the folder under “My Documents”. The next thing to be done after that, just once, is to open the folder and go into the sub-folder named Settings, and look for an icon which is supposed to look like a red tool box (that’s the Toolbox programme icon). The file’s name is “NAWACOLEX 2.1” (or “NawaCoLex 2.1.prj”). Make a shortcut to it on your desktop. Now close the Toolbox folder; you probably won’t need to open it again.

Opening NAWACOLEX for the first time Try out the desktop shortcut you just created by double-clicking it. A Toolbox window should now open displaying NAWACOLEX. It will look like this:

Installation instructions

9

Installation troubleshooting (just in case) If on opening NAWACOLEX for the first time the screen shown above is all you see, then installation is complete and successful. Sometimes on opening an existing Toolbox project on a different machine, the application has some difficulty locating some of the project files in the new location and a dialogue window will be displayed which is basically asking where a certain file is, or requesting confirmation as to whether the location shown in the right one. When that happens it is sometimes sufficient to click on OK or to give a “common-sense” answer. If it happens there will probably not be just one such question but a whole string of them, and they all need to be answered the same way. Be patient and answer all its questions until it stops (it will, eventually!). (Sometimes it seems to be asking the same question more than once, but actually it isn’t.) Usually this procedure will solve the problem. In that case, you probably won’t see those questions again (unless you reinstall) because the project’s configuration will have been corrected. If there is another problem and you can’t get the project to open, please let me know.

USING NAWACOLEX Introducing the NAWACOLEX screen The image on the previous page shows what you will see each time you start a session of NAWACOLEX 2.1. The main window contains five sub-windows (henceforth simply called “windows”):  Text window (in the center)  Finder window (top left)  Lexicon window (lower left)  Wordlist window (top right)  Concordance window (at the bottom) The Text window (top centre) is where texts will be displayed. Here for example we see the first few lines of the Masin text in the text window:

The Lexicon window (bottom left) can display a lexicon (vocabulary or dictionary). Here for example, it shows us an entry from Lyle Campbell’s dictionary:

Using NawaCoLex

11

The Wordlist (top right) is a list of every different word form that occurs in the corpus, and tells us how many times it occurs, and where. Here we see the entry for the word form siwatket:

The Finder (top left) can be used in several ways to cause items to be displayed in other windows. 1) If we write a Nawat word in place of “WORD” (in green), then rightclick on that word, we can look up that word in a lexicon. Since there are a number of

Using NawaCoLex

12

lexicons, often we will be given a choice. For example, if we write “siwat” in the Finder, the different entries will be listed so we can pick which one we wish to be displayed:

The next line in the Finder can be used in the same way to look for words that begin with the letters we type in:

There are more items found here than fit in the Multiple Matches window, so we may need to scroll through it.

Using NawaCoLex

13

The Lexicons section of the Finder lists each of the lexicons in the database; we can cause a lexicon to be displayed by right-clicking on its name. Let us “open” the vocabulary of Próspero Aráuz:

We can now click on the Browse view button (near the middle of the button bar), or press Alt-R, to show a scrollable list of all the entries in this vocabulary, and display any entry by double-clicking it (to return to Browse view, hit the Browse button again):

Using NawaCoLex

14

In the illustration above we have also scrolled down the list of lexicons in the Finder window so that we can see the list of available lexicons in NAWACOLEX 2.1. Notice also that the Browse view of the Arauz lexicon shows Nawat headwords in alphabetical order. If you have seen Próspero Aráuz’s book you will know that there is only a Spanish-Nawat glossary, making it hard to get an overview of the Nawat lexicon in his book. In NAWACOLEX all the lexicons are ordered by the Nawat. Scrolling still further down the Finder window we come to the list of texts. In the following illustration we have chosen to display the Arauz text (as opposed to the vocabulary from his book, which we just saw). Texts, of course, appear in the text window:

Clicking on the Browse button or pressing Alt-R here, we will see a table of contents and choose a page to go to. Finally, we can search for any word or a phrase in the corpus by pressing Ctrl-L to bring up a lookup dialog which lets us generate a concordance that will be displayed in the Concordance window. Basically a concordance is what would often today be called “search results”. Here is a concordance for the word “siwatket”:

Using NawaCoLex

15

Using NawaCoLex

16

Basic how-to’s Here are simple procedures for performing some of the simplest and most useful things in NAWACOLEX 2.1. If you haven’t already done so, it is suggested that you should now get NAWACOLEX installed, open it up and try it out, working your way through each of the following simple operations to familiarize yourself with them.  How do I view a particular book or text item contained in NAWACOLEX? In the Finder window, scroll about halfway down to where the list of texts starts. The texts are listed in this order: first, published texts (whose codes begin with NPT, for Nawat Published Texts); second, miscellaneous texts (codes begin with NMT, for Nawat Miscellaneous Texts); last, IRIN’s interview transcriptions (codes begin with NDTI, for Nawat Documentation Texts - IRIN). Each line consists of a short name (such as “Masin”) followed by its code (such as “NPT01”) and “-Cover”, in parentheses. Right-click on the code in the parentheses. The “cover page” of that text, showing its full title, will be displayed in the Text window. For example, right-click on “Lopez” in the Finder window and you will see this:

 How can I browse through the document and read it? Unless the Text window is already the active window (which it will be if you just opened it!), click on it to make it active. Now, press Alt-R, or click on the Browse view button, or choose “Browse” from the “View” menu. Any of these things puts the window in Browse view, when it will look like this:

Using NawaCoLex

17

You can now select any “page” of the document and read it by going back out of Browse view. To do that, double-click on the name of the page (or click once on a page to select it and click on the Browse button again). Let’s double click on the page titled “Ne kojtan”. Now we will see this:

Another way to browse through the text document is to use the navigation buttons at the top to move from the current page to the previous or next page, the first page (the

Using NawaCoLex

18

cover page) or the last page. We will click on the right arrow to view the next page in this document:

You can also press Alt-P to go to the previous page, or Alt-N for the next page.  How can I display a particular lexicon? Go to the list of lexicons in the Finder window, and right-click on the lexicon you wish to see. That will bring up the lexicon in the Lexicon window. Now press Alt-R or click on the Browse button to see the list of lexicon entries. You can scroll up and down the list “manually” and find an entry you are looking for.

Using NawaCoLex

19

Notice incidentally that automatically, because of a feature called parallel movement, when we click on the “siwat” entry in Todd’s lexicon NAWACOLEX looks up that word in the wordlist as well. To see the lexicon entry, double-click on it (or click once to select it and click the Browse view button):

 How can I see at a glance which lexicons have entries for a particular word? To look up any word in any lexicon, go to “Look up...” at the top of the Finder. On the line where it says “WORD”, delete “WORD” and type a word in its place, then use the mouse to right-click on what we have just typed. We already saw an example of this earlier when we looked up “siwat”, a word which is so common that it is in all the lexicons. Some words are not in every lexicon. Take the word “siwanawal” ‘sigunanaba’: which lexicons have this word in them?

Using NawaCoLex

20

To view one of these entries, double-click on the lexicon we choose to view (or click once to select it and either click on OK or press the Enter key). Let’s double-click on the Campbell dictionary entry:

When the word we search for only occurs in a single lexicon, NAWACOLEX saves us the trouble of going through a redundant Multiple Matches dialog and immeditely displays the only result found in the Lexicon window. For example, Arauz’s glossary is the only one that lists “siwakujtan”; if we put “siwakujtan” in the Finder, therefore, Arauz’s lexicon is displayed:

Using NawaCoLex

21

We can type something into the “BEGINNING” line instead of the “WORD” line, in which case all items that start with the letters typed are included in the search, e.g.

 What is the Wordlist and how do I access it? Do not confuse the wordlist and the lexicons. The lexicons are “man-made” documentation sources published or written by Nawat scholars which list words with their glosses (translations) and sometimes with other information too. The wordlist is a list generated automatically by NAWACOLEX itself, not from these lexicons but out of

Using NawaCoLex

22

NAWACOLEX’s text corpus. The list is produced mechanically, and includes not dictionary headwords but word forms occurring in the texts. It is a word-(form) inventory which predictably will coincide to some extent with the lexicons but not entirely, which contains inflected forms as well as dictionary headwords, and which obviously cannot show the words’ translations! What the wordlist does tell us is what forms occur somewhere in the texts, how many times each form occurs, and at what places in which texts. Sometimes the Wordlist opens automatically when you click on an identical entry in a lexicon (through something called parallel movement). You can also force it to open by typing a word on the bottom line of the Finder. You can scroll up and down the Wordlist, select items, and view an item in Browse view or normal view:

The Wordlist contains thousands of items. Luckily there are other ways to find something in it besides scrolling through the whole thing. One way is this: go to the bottom line of the Finder window and replace the letter A with any word, then right click on it:

Using NawaCoLex

23

Another way was mentioned above: if you go to a new headword in a lexicon entry, that word will be displayed at the same time in the Wordlist, through parallel movement. Naturally both these methods only work if and when the item is in the Wordlist, so if nothing happens when you click on a headword that is the explanation. Notice that many words in lexicons are not attested in the text corpus, while citation forms (especially those of transitive verbs) do not occur as free forms in real texts.  What can I do with the Wordlist? The Wordlist appears by default in Browse view, but you can double click on an item to see the list of its occurrences. Here is the list for “siwatket”. (You will probably need to reformat or “reshape” the list to make the list fully visible like this; see the explanation later on about margins. To reshape this list, press Shift-F5 with the cursor on a line code.)

Using NawaCoLex

24

What we see here is a list of codes each of which is the identifier of one sentence somewhere in the corpus. You can see each place in the corpus by right-clicking on one of the codes. This will bring up a duplicate window showing the page on which this occurrence is located, with the cursor located before the line where this occurrence is found (often at the bottom of the window, so you may have to scroll up a bit; here we have already scrolled up so you can see what’s there):

This duplicate window can be used just like the primary window, but you can safely close it when you are done with it.

Using NawaCoLex

25

Getting about: jumps and parallel movements “Jumping” happens, in Toolbox lingo, when you right-click on something somewhere, and as a result something somewhere else pops into view. Parallel movement is something that NAWACOLEX does of its own accord, where because you left-click on something, something somewhere else comes into view (without right-clicking). Don’t worry about the theory (even I find it confusing), just learn the tricks. Right-clicking in particular can achieve all sorts of things in NAWACOLEX!  From a wordlist entry to a lexicon entry Here’s one trick to start with. Suppose you are looking at the entry for “siwat” in the wordlist (either view will do); now you would like to check the lexicon entries for the same item, “siwat”. You already know that you can find them by typing “siwat” into the WORD line of the Finder, but wait.... no need! Just right-click on “siwat” in the wordlist:

(Notice something else over on the left, incidentally: in addition to displaying the list of lexicons, NAWACOLEX seems to have randomly selected one of them to display in the Lexicon window anyway, even before we have chosen one! It has “decided” to show us Schultze’s entry in this case. This actually happens independently of our rightclicking to obtain a proper “jump”; even if we just left-click on “siwat” in the wordlist, the Schultze entry still pops up as if by magic:

Using NawaCoLex

26

This is a feature of Toolbox called “parallel movement”: here, the Lexicon window automatically “harmonizes” with the wordlist.)  From a lexicon entry to other lexicon entries For our next jump type, suppose we have one of the lexicons in view; Werner Hernandez’s vocabulary, for instance. We’re looking at the entry there for “siwat”. We can do a “jump” by right-clicking on the headword “siwat”. Once again, this causes a list of all similar entries in the different lexicons to display as a Multiple Matches dialog:

Using NawaCoLex

27

Thus you can check at a glance which lexicons have entries for this word. The effect of right-clicking on the headword in a lexicon window is similar to that of typing the word into the WORD line of the Finder window and right-clicking on that (but faster).  From a word in a text to lexicon entries Similarly, when reading any text in the corpus we can jump direct to a lexicon list:

Using NawaCoLex

28

This is a powerful and practical tool. With one click you can not only look up a word in a dictionary but in all the dictionaries simultaneously, see the results at a single glance, and if you choose, open one of the dictionaries at the entry in question with a further double-click. Alternatively you can look at the glosses in the Multiple Matches window and then then close it (by pressing Esc or clicking on “Cancel”). As we saw before, if the word you look up is only in one dictionary, instead of a Multiple Matches dialog NAWACOLEX will go ahead and display that one at the right place, in the Lexicon window:

“Taria” only occurs in one lexicon, Schultze’s, so when we right-click on “taria” in the Text window, Schultze’s entry for this word is displayed.  Non-citation forms Some words in the texts are not listed as entries in any lexicon for various reasons. Perhaps the word is simply not documented in any of them. Or perhaps it is just in a different spelling in the text. Most often, however, the reason why text words are not found is that the lexicons list citation forms, not all the forms of a word. For example, the lexicons all have an entry for “siwat”, but not for the plural “siwatket”. Here’s another example:

Using NawaCoLex

29

When you get this message, press Esc (or click on the Cancel button). If you know that the citation form of “shiktamuta” is “tamuta”, you can of course type “tamuta” on the WORD line in the Finder. Or, here is another clever trick. Use the mouse to select the part of “shiktamuta” that you think will be the lexicon headword if it is listed, i.e. the “tamuta” part: “Shiktamuta”. Now try again! When something has been selected and you right-click on it, NAWACOLEX will search for the text you have selected and obtain different results:

Using NawaCoLex

30

 Variants and sub-entries Jumps to lexicons look for matches with the headword of a lexicon entry (i.e. the \lx field). These same jumps (whether from the Finder or a document) also check two other fields in a lexicon where matches may be found: variant forms (in a \va field) and sub-entries (in an \se field). They check these automatically. For example, in the Schultze lexicon there is not an entry with “cha” as the headword:

Nevertheless, when we search for “cha” the results will include a reference to Schultze’s entry “chiwa”, because “cha” has been listed as a variant of “chiwa”. It will say “cha” in the Multiple Matches window:

Using NawaCoLex

31

Because “cha” is not a headword in the Schultze lexicon as it has been transcribed, when we double-click on this match the entry is displayed not in the Lexicon window but in a duplicate window, and now you can see that it is a variant:

The duplicate window can be read in full (and closed when no longer needed), or if preferred you can right-click on the headword “chiwa” and so open the same entry in the Lexicon window. It works the same way with sub-entries, which are derived forms or phrases included under a headword, e.g.

Using NawaCoLex

32

“Chiupichin” is a sub-entry under the headword “chiupi” in the LBN lexicon. We cannot see the headword in the illustration because the entry is long, but we would be able to scroll up to it. Notice the status bar! Here is the full LBN entry:

Using NawaCoLex

33

A sub-entry may also be a phrase. Indeed, you can type a phrase in the Finder to search for it; however, you must select the phrase you want to find, or else only the word under the pointer will be looked for:

The same result can be obtained by selecting and right-clicking phrases in documents:

 Other jumps If you like, experiment a little by right-clicking on different things on the NAWACOLEX screen to see what happens! You may turn up a few surprises. When you try to “jump”

Using NawaCoLex

34

from something for which no jump path has been configured you will get a message like this:

Some more Toolbox things... NAWACOLEX is housed in *the Field Linguist’s+ Toolbox, a highly flexible system developed by the Summer Institute of Linguists. Consequently it inherits many features of Toolbox which are either permanent characteristics or default settings. Here are a few more things that may or may not be useful to you, but it’s good to know about them anyway if you become a hardcore user of NAWACOLEX. For more complete information about Toolbox in general, consult its documentation which is freely available for download from the SIL’s website.  The windows (and duplicate windows) The Toolbox screen is a programme window that can be resized, maximized and minimized like most windows can, but in most cases it is probably best to leave it as it is. In any case, whatever changes are made while NAWACOLEX is open will be forgotten when it is closed because of the “read only” setting. Therefore don’t be afraid to experiment if you like as this will not affect anything permanently. All of the document windows are likewise resizable, and the same observations apply: they have been configured in a particular way, which it is recommended to maintain but which can be changed; any such changes will be forgotten when you close NAWACOLEX. At certain times you may find it convenient to enlarge a single window that you are working with. For example, if you are only interested in reading a text and would like to see more of it on screen but are not interested, for now, in the content of the other

Using NawaCoLex

35

windows, why not enlarge it? You might want to maximize the text window to facilitate reading, for example, like this:

The advantage of maximizing rather than changing the dimensions of the window randomly is that you can subsequently click on “restore”, and the window will return to its original size, recovering the default layout. It is possible to close a window inside NAWACOLEX, but there is no point and in most cases you should not do this, because NAWACOLEX is set up to work with all the initial windows always open. Closing one of these windows is likely to cause it to “malfunction”, and may result in a warning messages like this:

If you get a message of this type it probably means you (accidentally?) closed something in NAWACOLEX. The best solution is usually to close the message by clicking on “Cancel” (or pressing Esc), and then close NAWACOLEX and open it up again: that will get rid of the problem. However, it is quite all right to close duplicate windows, discussed in several places in this tutorial. You can recognise a duplicate window by the “:2” (or “:3”, etc.) that

Using NawaCoLex

36

appears appended to the end of the file name displayed on the blue bar at the top of the window, e.g.

Duplicate windows are opened automatically by some operations in NAWACOLEX, and you can also create a duplicate of the active window manually by going to the “Window” menu and choosing “Duplicate”. If you see a duplicate window that you no longer need, go ahead and close it. There is also nothing to stop you from changing anything that it says in any window in Toolbox. But bear in mind that as NAWACOLEX is configured, any such changes will not be permanent; they will last until the next time you close NAWACOLEX, then revert to the original content.  Movable columns and fields If you look closely at the Text window, the Lexicon window etc., you will notice a thin vertical black line running down the left-hand edge of each window:

Using NawaCoLex

37

Any such line can be dragged to the right or left using the mouse. When we move the line towards the right, we will see that there are in fact two columns to each window. The left-hand column, which we haven’t seen until now, contained field tags; the righthand column contains field content. Let’s move a couple of dividers to the right so we can see the tags:

This provides a glimpse of how Toolbox (and so, NAWACOLEX) works. A section or page of a Toolbox document (a text, a lexicon, etc.) consists of a list of fields, each of which

Using NawaCoLex

38

is made up of a field label (a tag) and the field’s textual content (a character string). The “settings” of a project include definitions of the characteristics of each kind of field in each kind of document (or database type), which determines their behavior. The left-hand column (sometimes hidden) says what type of field it is, the right column holds its content. You may not need to know much about the types of fields used in NAWACOLEX, but it does no harm to understand that such types exist and govern NAWACOLEX’s behavior.  Hidden fields Try this: bring up a text in the Text window. It will look something like this:

Click in the window to make sure it is the active window. Now press Ctrl-M, and you will see something like this:

Or if you want to drag the separator out to see the field labels:

Using NawaCoLex

39

If you press Ctrl-M again, the line codes that just appeared will vanish again. As we can see, the disappearing fields are a type called \ref. By default the \ref fields are hidden from view in text files. Under certain conditions, “Hide fields” may get turned off automatically and you will see the hidden fields. In reality they are always “there”, you just don’t always see them. NAWACOLEX needs the line codes to function because it uses them to identify where words are in texts. If they appear and you want to hide them, press Ctrl-M.  Browse view We have seen that each document in Toolbox has two “views”. In the normal view we can see one record (section, page, entry...) at a time, and we can see all the information in that record (except when concealed with “Hide fields”, see above); we must navigate between records if we wish to move from one to another. In Browse view, we get a synopsis of the whole document, with one line per record and only some fields of each record are displayed. The Browse view is organised in columns for different fields, and the separators can be dragged left and right to change their width. To switch between the two views, do any of the following:  press Alt-R;  click on the Browse View button;  go to the “View” menu and click on “Browse”. Some operations can be performed no matter which view you are in. For example, you can jump from a headword in a lexicon, as described above, by right-clicking on it regardless of whether the lexicon is in normal or Browse view. Fields can only be edited in normal view.

Using NawaCoLex

40

 Side browse The following comes in useful if you are making a lot of use of a particular lexicon or text. Suppose you want to work a lot with the Campbell lexicon. With this lexicon as the active window, go to the “View” menu and choose “Side browse”. A duplicate of the lexicon now appears to the left of the original one, but the duplicate is in Browse view and can be used as an index to look up entries, which appear in normal view in the original window. Scroll the index (on the left) up and down, pick a word and left click on it: through parallel movement, the main window (on the right) will displays that entry in full:

 The program bar and the status bar At the very bottom of the Toolbox screen is the status bar, which contains some useful information. When you start up, it will probably look like this:

In the right-hand part of the status bar we read “NawaCoLex 2.1.prj”, which simply identifies the Toolbox project we are looking at. That would be useful information if you had more than one Toolbox project. The leftmost cell is empty and the second cell says “\id Finder Window”. This refers to the current item in the active window inside the NAWACOLEX screen. When you start NAWACOLEX, the Finder window is active. Notice also that the blue program bar at the very top of the NAWACOLEX screen tells us which window is active:

Click on each of the five windows in the startup screen and notice how the information in that cell and in the program bar at the top both change. For instance, if you click on the Lexicon window, the status bar will look like this:

Using NawaCoLex

41

Now let’s see what the status bar does when NAWACOLEX is in action. Using any of the methods you now know, look up the word “ashan” in the Arauz lexicon. You should end up with a screen looking something like this:

Check out the program bar at the top: it informs us that the active document is “Arauz”. Now look at the second cell on the status bar at the bottom: this tells us that we are looking at the entry “ashan”. (“\lx” is the name of a field; this is the field of headwords in NAWACOLEX’s lexicon entries.) You might wonder why we need to be told that we’re looking at the “ashan” entry when we can see that perfectly well in the lexicon window anyway. But what if the entry were a long one and we had scrolled down to a part of it where the top line is no longer visible? If we didn’t remember what entry we were looking at, the status line might be a useful reminder:

Using NawaCoLex

42

Notice also the fourth cell on the status bar which in the Arauz example said “47/999”. This means that the current document (i.e. the Arauz lexicon) contains 999 entries in total, and the one we are looking at (the “ashan” entry) is entry number 47 (in alphabetical order, since that is how the entries are ordered). In the second example we see that Lyle Campbell’s lexicon contains over three thousand entries, of which “ashan” is number 156 in alphabetical order. Let’s make the Wordlist window active to see what the status bar says:

The Wordlist is a different document type from the lexicons, and the field tag is \w in this case; the field’s content is still “ashan”; and looking further to the right we learn that the NAWACOLEX wordlist, i.e. the list of distinct word forms found in the entire text corpus, numbers 7,301. Notice that this is not the number of tokens in the corpus, its word count, which would be much higher. Unfortunately I have not found a straightforward way to get Toolbox to produce a total word count. By indirect means I have estimated that this corpus contains something in the region of 62,500 words. I have counted 7775 lines, i.e. sentences, in the corpus. Next, let’s try this out with a document from the text corpus. When we call up, for example, the Masin text document on the first page (or cover), which merely identifies the book, the program and status bars look like this:

Using NawaCoLex

43

The program bar actually tells us the name of the file that is being read. For NAWACOLEX this text is actually identified by the code NPT01, and “(Masin)” was added to the name to help users to remember what text this is. The status bar tells us which “page” we are on; of course we are on the first page, the cover, and the internal name for this page is “NPT01-Cover”. In this type of document, the NAWACOLEX field tag for the page identifier is “\id”. The status bar also tells us that there are 59 pages in this document, which is the number of all the “stories” plus one for the cover.  Printing and exporting If you should wish to print out the content of any document (or file or “database”, in Toolbox parlance), whether it be a text corpus component, a lexicon, a concordance or the wordlist, there are ways this can be done. You can also export such information to a Microsoft Word, HTML or text file. These options are found under “File” in the main menu. Rather than attempting to print directly through the “Print...” command, you may find it preferable to first use “Export...” to create a printable document that you can tweak before proceeding to printing. Please give proper credit to NAWACOLEX and its copyright holders if and when circulating substantial materials derived from it; failure to do so without express permission will be treated as wilful breach of copyright and prosecution may ensue.  Margins Although each field is in principle represented by a physical line in a database record, Toolbox bends this by permitting long lines to be wrapped. This becomes clearer when we show the field tags:

Using NawaCoLex

44

This wrapping is available as a Toolbox function, accessed by pressing Shift-F5 or from the “Database” menu, where it is referred to as “reshaping”. In most cases this has already been done in NAWACOLEX. You will find exceptions, however, where you are unable to see a complete line (i.e. the content of a field) because it has not been “reshaped” and runs off the edge of the window on the right, e.g.

When this happens, one option is to maximize the window. Another is to reshape the file as described. This issue also arises with the lists of references in the Wordlist, e.g.

Using NawaCoLex

45

Here we have found “apan” on the Wordlist, and we see that there are thirteen occurrences of the word yet we can only see five of the references. The others are simply out of sight beyond the right-hand margin. This is because, for technical reasons, the lines of the wordlist are not wrapped by default. To solve this problem you can reshape the record: place the cursor somewhere on the truncated list of references, press Shift-F5 and it will now look like this:

Now we can see all thirteen references. Two notes here: 1) if after pressing Shift-F5 you still don’t see everything, you will need to reset the wrap margin, which is done from the “Database” menu with the cursor on the line to be reshaped; 2) DO NOT ATTEMPT TO USE THE “RESHAPE ENTIRE FILE” OPTION ON THE WORDLIST! This will not work; it will cause Toolbox to hang because the list is so long that it produces a memory overload. If you make this mistake and NAWACOLEX freezes, you will have to close the program and restart it. Instead, reshape individual wordlist records as needed, using Shift-F5 as just described.  Open windows and the “Window” menu option As already explained, the files that NAWACOLEX needs to have open to function are opened when you open NAWACOLEX and stay open until you close it. Even if you can’t see it, each open file has a window; when you don’t see it, it’s simply because other windows are on top of it. Using the “Window” option in the main menu, you can see the open windows listed; if not immediately shown, click on “More windows...” at the bottom. You can cause any window you like to be displayed at any time by clicking on it in this list.

RESEARCH TOOLS NAWACOLEX’s research tools It is not entirely facetious to say that the most important research tool is the researcher’s brain. It is up to the serious researcher who uses NAWACOLEX to take stock of the options the package offers and develop their own strategies to benefit from these. Those options include some further tools in addition to the basic functions seen so far, and to these we turn in this chapter: concordances, sorting and filters.

Concordances  What is a concordance? In the present context, a Concordance can be defined technically as a Toolbox document generated automatically by NAWACOLEX according to parameters set by the user, which lists all the occurences in a (sub)corpus of a particular word, set of words or group of words, provides a reference to the location of each occurrence and also displays some of the context in which the item searched for is found in each of its occurrences. To put it another way, the concordancer works like a search engine, with the document we call a “concordance” resembling a list of search results. Here is an example of a concordance in NAWACOLEX:

This concordance shows all occurrences of the word “tasul” in the whole NAWACOLEX corpus. Six occurrences were found, as we are told by the status bar. We can see all six references in the left-hand (“Reference”) column of this concordance because they happen to all fit in the window; if there were any more, the list would be scrollable. On each line (each of which corresponds to one occurrence), the item identified as matching the search string (in this case, the word “tasul”) is shown in the column labelled “Target”; to its left and right, in the “Before” and “After” columns, the rest of the sentence is shown (or as much of this as fits); these two are sometimes referred to together as the context. Luckily, these sentences are all rather short so the complete contexts fit, but when they don’t it is worth remembering that the thin vertical lines separating the columns can be dragged to the left or right to change the width of the

Research tools

47

columns. But if we want a really complete view of an item’s context, there is another way: the references on the left are line codes which can be right-clicked to display a duplicate window containing the full document page on which any occurrence is located, e.g.

If you are going to use concordances a lot, it is worth learning to “read” the line codes as they help to convey at a glance a profile of an item’s distribution across the sources. In the present example all six occurrences of “tasul” pertain to one collection of texts, the text labelled “NPT01”: this is the code for the Masin corpus (Nawat Published Text number 1). We see that “tasul” occurs in three Masin stories, numbers 1 (twice), 34 (three times) and 40 (once). Bear in mind that this is a text concordance; it does not take into account mentions in lexicons. Of course, the lexicons are themselves interesting as another form of language documentation. If you want to check the lexicons for a word in a concordance, all you need do is right-click on words in a concordance to look them up in the lexicons. So if you wish to see which lexicons have entries for the target word “tasul”, just right-click on it in the Target column:

Research tools

48

In this particular case we see immediately, if we are familiar with the lexical sources, that “tasul” is a word found only, it would seem, in the varieties of Izalco and Nahuicalco, the “highland dialects”. You can also jump to the lexicons from any word in the “Before” and “After” columns. But in fact it does not work well from the Browse view of the Concordance so if you want to do this, double-click on the line you want first to put it into normal view. From there you can right-click on the word you wish to look up. Remember too the trick of selecting part of a word to look up as a headword:

Of course, you can also look up words in the lexicons from the duplicate window of the referenced text that you have opened by double-clicking the Multiple Matches list. Just right-click on it:

Research tools

49

 Generating a concordance: the search parameters Now that we know what a concordance is and what you can do with it, it is time to find out how concordances are obtained in the first place. The way to create a concordance is to open the concordance dialog. This is done by pressing Ctrl-L, or by choosing “Concordance...” from the “Tools” menu. The dialog window looks like this:

The most crucial parameter to set is “Search For” where you say what you are looking for. You will usually find something already in the “Search For” box, which you will often need to delete in order to put something else in. What you see there may be the last string you searched for or the word that the cursor is at in the active window at the time. It is useful to know that what you write in the “Search For” box need not be limited to a single word; you can also put in a phrase. If you do, the concordance will only show occurrences of the first word you write when it is followed by the other words specified. If, for example, you type in “wan yajika”:

Research tools

50

this is the resulting concordance:

Phrases of any length can be concordanced in this manner. Pay attention to the “Match” parameter as well; this is where you specify whether you are only interested in finding words that coincide with your search string (in which case, choose “Whole”), or all words that contain the string anywhere in them (“Middle”), or just words that start with the string (“Begin”) or end with it (“End”). In the above example, the settings used were “tasul” and “Whole”, so the concordance only included occurrences of the word form “tasul” per se. Let’s change “Whole” to “Middle” and see what happens:

Once this is done, either press Enter or click on “Lookup”. Notice the difference between the concordance this produces and the one we saw above:

Research tools

51

Before there were six results, now there are seven (see the status bar), because “itasulka” has been added, a word which contains the string “-tasul-”. The “Middle” option also catches words which begin or end with the search string, but as it happens there are no longer words in the corpus that begin or end with this string. Now compare:

and the resulting concordance:

Here we clearly see that when we choose “Middle”, the string can occur anywhere at all in a word for that word to be included in the concordance. Now observe these examples of concordances obtained using the “Begin” option with the string “siwa”:

and this one using “End” with “wan”:

Research tools

52

Notice that “Begin” and “End” will also include words consisting entirely of the search string, since they also start and end in that string! Here is the concordance for words that “End” with “pak”:

Toolbox logically considers that the word “pak” ends in “pak”. When you concordance a phrase, the “Match” parameter refers to the phrase as a whole. The “Begin” option with a phrase will search for phrases containing the words you type except that the last word need only begin with the string typed, so for example “Begin” with “ne mu” selects phrases such as “ne muteku”, “ne munan”, “ne mukshi”, “ne muyak”, “ne mutechan”...:

“End” with “k yajki” helps us to find examples of a particular Nawat verbal construction (preterite + “yajki”):

Research tools

53

The other parameters in the Concordance dialog are self-explanatory and need not be discussed here, except for one: the “Text Corpus” parameter, to which we now turn.  Choosing your text corpus Concordances can be carried out on different texts or groups of texts; that is, on different corpora (or sub-corpora). This will depend on the researcher’s needs and interests. You may decide to perform a study or profile one specific text, or one dialect area, or type of text. You may also wish to make comparisons between different texts, dialects etc. It is possible in Toolbox to narrow down the corpus of a concordance in these different ways, and some of these possibilities are implemented in NAWACOLEX 2.1. That is the purpose of the “Text Corpus” parameter in the concordance dialog. The default setting for this parameter is “All Nawat texts”. When we click on “All Nawat texts” a drop-down list of corpora is displayed:

Notice that the list scrolls down! The pre-configured settings in NAWACOLEX 2.1 allow you to perform concordance searches on all its texts, on the Masin corpus alone, or on just the Arauz text. We can also choose to look just at texts from Witzapan (namely: Campbell-W, Derechos, Naja Ni Genaro, and Genaro Ramírez and Paula López’s miscellanea) or just at texts representing Nawat of the highland area (Masin and

Research tools

54

Arauz). These corpora do not include the IRIN transcripts. We can also restrict our searches to the transcripts of the IRIN interviews, should we wish. So say we would like to know whether “ukich”, either as a word or a combining form, is used anywhere in the IRIN interviews. We’ll set our parameters as follows: “IRIN (all speakers)”, “ukich” and “Beginning”:

Here is the result:

Here we find out that the word “ukich” (or its plural “ukichket”) occurs in no fewer than five of the eight IRIN interviews in our corpus, and a sixth interview contains the compound form “ukichtijlan”. That is perhaps a surprisingly high proportion, but we might wonder if it is not perhaps because we are including what the interviewer said here (the interviewer is included in “all speakers”), since in most of the interviews the interviewer was Paula López. Does this make the finding less meaningful? Well, it could be but we don’t know. Fortunately there is a way to check on this, because a second interview corpus option is provided which only counts the words spoken by interviewees in these materials. Here is the concordance obtained when we change that parameter:

Research tools

55

The findings are interesting. Although the number of occurrences found is reduced from 14 to 9, the total number of different interviews remains at six out of eight, meaning that the interviewees in all these recordings used the word “ukich” (or the compound “ukichtijlan”). Even when there is no corpus defined that includes exactly the texts to which you wish to restrict data, you can opt for another strategy: search in a corpus that includes the texts you want; make sure that the concordance results are ordered by the reference field (this is the default), so that occurrences in the texts that you are looking at are grouped together, and ignore the other data. For more about sorting results, see below.  Concordancing lexicons: working from Spanish/English to Nawat It is also possible to carry out a concordance of a lexicon. This offers many possibilities, given the complex structure of lexicons. Two possibilities are of particular interest: using a concordance to perform more “advanced” searches than are possible using the procedures seen so far, and using a concordance to search for Spanish (or English) glosses for which we wish to find Nawat equivalent (i.e. using NAWACOLEX as a SpanishNawat or English-Nawat glossary). Here’s how to look up a Spanish word. Press Ctrl-L to bring up the concordance dialog, and select “All lexicons: from Spanish” from the “Text Corpus” list. Then type a Spanish word into the “Search For” box:

Research tools

56

The resulting concordance is:

Let us maximize the Concordance window so as to see the full list of results:

The way these results are presented is not entirely satisfactory: unfortunately, they don’t tell us which lexicon each result comes from. For example, we see on line 4 and 5 that there are two lexicons with an entry for “kekelutza”, one of which (line 4) glosses it in Spanish as “menear; mover”,the other (live 5) as “mover(se)”. All we can see here is that there are lexicons that give these words and gloss them as shown. That in itself may be useful, and if we wish for more detail, we can search for it by other means: for instance, by right-clicking on one of the Nawat words in the “Reference” column, which will have the usual effect of looking up the Nawat word in the lexicons:

Research tools

57

In the above example, the entry for “kwani” in the Hernandez lexicon is displayed since this is the only lexicon with an entry for this form of the word (compare “ijkwani” etc. found in other lexicons). As it happens, the lexicon entry offers no more information, in this case, than was already shown in the Concordance window, but in any case now we know which lexicon it has come from. Other times, a Multiple Matches window results, e.g.

On this screen we may notice (on line 1 of the Multiple Matches window) that it is BibLex which has “kekelutza” with the gloss “menear, mover”. The lexicon which has “kekelutza” glossed as “mover(se)” must be one of the other three shown in the Multiple Matches window, but we can’t be sure which one. Why? Well, let’s have a look at Campbell’s entry for “kekelutza”, by double-clicking it (on line 4 of the Multiple Matches):

Here we discover that this lexicon does have “mover(se)” as a gloss, but this gloss was not shown in the Multiple Matches window, which doesn’t have room for multiple glosses and so only shows the first gloss it finds. In this roundabout way, then, concordancing allows us to “root around” and find Nawat translations for Spanish words, even though the lexicons it contains are

Research tools

58

organised primarily as Nawat-Spanish glossaries. With a little practice this turns out not to be as awkward as it sounds! The procedure for looking up words from English glosses is the same. Just choose “All lexicons: from English” as your “Text Corpus” and follow the same steps. Needless to say, using English to search for Nawat words in the lexicons will only lead you to those lexicon entries which have an English gloss in them! A hint: if you really want to exhaust all your possibilities leaving no stone unturned, try out both Spanish and English lookups to be on the safe side. Sometimes it is useful to use the same strategy while employing a “Middle”, “Begin” or “End” match rather than “Whole”. For example, say you want to find words in Nawat that correspond to any of the words “peligro”, “peligroso” or “peligrar”, or even inflected forms of these: in that case try a search for “peligr”, setting the Match parameter to “Begin”:

This gives the result:

 Concordancing lexicons: advanced Nawat searches The easiest way to look up a Nawat word in the lexicons is by right-clicking on that word in the Finder or another window. When this works fine there is no advantage to using a simple concordance search, given the limitations of concordances on lexicons. For example, we already know that right-clicking on “tasul” reveals which lexicons have this entry:

Research tools

59

If we concordance “tasul” in “All lexicons: from Nawat” the result we obtain is rather less helpful:

We would ideally like to be told, in the Reference column, where the three occurrences of “tasul” occur, but NAWACOLEX cannot do this unfortunately. True, we can get more information by right-clicking on “tasul” in the concordance:

Research tools

60

However, this is the same information we could have got by right-clicking on the word in the Finder or a document: we have come full circle. Thus concordancing is not always the optimal way to search for lexical information. However, in other cases concordancing makes it possible to retrieve more sophisticated information than that which can be accessed in other ways. How? First of all, if a lexical item consists of more than one word, a concordance will check for all occurrences. So for example, right-clicking on the phrase “ka imaku” turns up one entry for it in the Arauz lexicon, where it appears as a sub-entry under “ka”:

This works if we are right-clicking on “ka imaku”, but we probably wouldn’t know, without hindsight, that we need to search for the complete phrase to find anything. Right-clicking on just “imaku” yields nothing, because that isn’t an entry or a sub-entry:

Notice that this is the same kind of problem we sometimes encounter with printed dictionaries, and so nothing new really. It is the “it’s in the dictionary somewhere but I

Research tools

61

can’t findit” syndrome. Now then, if on the other hand we run a concordance on “imaku” (in the lexicons):

the Arauz entry is found — although of course the lexicon is not identified and the information shown is scanty:

Just the same, this at least tells us that the collocation “ka imaku” is listed and this clue may help us to track it down. Concordancing has another edge over right-clicking in that we can do concordances using “Middle” or “End”. Say for example, we are not sure whether the lexicons list imaku, numaku, mumaku, etc. or just maku. With a concordance we can set the parameters as “maku” and “End”, covering all these possibilities. This finds:

Finally, NAWACOLEX has another corpus setting, “All lexicons: other fields”, which trawls through various fields in the lexicons fishing for additional data (variants, sub-entries, source spellings, phonetic transcriptions, inflected forms, Nawat glosses and examples, among other things), though in so doing it often catches unwanted “debris” in the net too. It may turn up some surprises! Here for example is the first screen of 90 results

Research tools

62

returned when we concordance “pak” with the setting “End” in the “All lexicons: other fields” corpus:

Among other things, this search comes up with examples for other headwords that contain something ending in -pak, inflected forms with this ending, any of my notes which have such a word in them, source spellings containing the sequence, and so on and so forth. You will need to sift through these offerings to see if there is anything that is actually useful. While that may seem tedious, when you are desperate to find something and need to try anything, this can be a godsend.

Research tools

63

Sorting  Sorting of lexicons The records (pages, entries, items etc.) of a database (i.e. document: text files, lexicons, concordances etc.) are ordered by sorting (putting into alphabetical order) one particular field. For example: the lexicons (e.g. the Hernandez lexicon, of which this is the beginining) are ordered by sorting the \lx field of their entries:

In NAWACOLEX this is the default ordering of the lexicons, alphabetized by their Nawat headwords (the \lx field), but you can temporarily change the ordering and sort by a different field. In order to alphabetize the lexicon by the Spanish gloss, simply do the following: put the lexicon into Browse view (as shown above) and just click on a field name (at the top of a column) to make that the new sort field:

Research tools

64

Reordering a lexicon file can have other uses besides alphabetizing the Spanish or English glosses. Whenever we sort by a field, we are automatically grouping together similar values for that field. Thus in a lexicon that indicates parts of speech, by sorting the part-of-speech (\ps) field we can cluster the entries into nouns, verbs, adjectives etc., and if differentiated, into intransitive and transitive verbs, for example:

Prior to doing this, you will need to include \ps among the fields displayed in Browse mode: in the “View” menu choose “Browse fields...” and add \ps to the list. Another way to change the order is to sort a field from the right:

One reason for wanting to sort a lexicon from the right is that it makes it easy to group words with similar endings and draws attention to patterns of endings:

Research tools

65

To sort a field from right to left, (1) make sure that the window is active and that the field in question is the primary sort field (click on the field name at the top of the column as before); (2) go to the “Database” menu and choose “Sorting...”:

(3) mark the “Sort first field from end” box in the “Sorting by Fields” dialog, and click on OK:

Research tools

66

 Sorting of concordances Sorting can sometimes provide a useful strategy for studying concordances as well. Suppose we have a concordance of words in the Arauz text that begin with “nu”:

This is the concordance, maximized to let us see more lines:

Research tools

67

Observe that the concordance is initially sorted by the “Reference” field, i.e. the occurrences of words beginning with “nu” are listed in text order. Perhaps this is not the most useful order for the examples to be in for our research or study purposes. Sorting may help here. We may, for example, decide to sort by the “Target” field so as to see which forms beginning with “nu-“ occur:

Research tools

68

If we were interested in seeing which words in the text that begin with “nu” end with “wan”, the quickest way might be to use right-to-left sorting on this field:

Another way to use sorting in a concordance is to sort by the “After” field. This will help us to study what words come after the target word(s). Suppose that we are interested in the syntax of the verb “nemi” in the Arauz text. First we will make a concordance, using the “Middle” parameter in an attempt to catch all inflected forms of the verb. Here we see the occurrences in text order:

Now we will sort the sentences by the “After” field:

Research tools

69

There are a lot of sentences here where “nemi” is followed by “ne” introducing a subject, suggesting the importance here of the VS (verb-subject) word order. Apart from that, there are quite a few sentences where “nemi” is followed by a locative phrase introduced by either a place adverb (“nikan”, “talchi”) or a locative preposition (“pak”, “tan”, “tik”...). We also notice the examples of the construction “nemi pal”. And so on. NAWACOLEX provides the means; now you must decide how to make use of them!

Research tools

70

Filters When you define and activate a filter in connection with a database, only items that fulfil certain conditions are taken into account. For example, if a lexicon has a field indicating part of speech, you could have a filter that allows you to only consider the nouns in the lexicon, or only the transitive verbs (or indeed only nouns AND transitive verbs, or only words that are NOT nouns and transitive verbs...). Or, you could have a filter that only selects verbs ending in certain letters. And so on and so forth. Once a filter is set up, you can use all the other methods with filtered files to carry out some sort of analysis. Once they are defined, filters can be turned on and off at will. No filters have been predefined in NAWACOLEX 2.1, and while you can make custom filters, they will not be saved in the pre-set read-only mode so will need to be formulated in each session when you want to re-use them.  To set filters, go to the “Database” menu and select “Filtering...”. (You will have to learn to use the filter syntax.)  To turn filters on and off, use the pull-down filter list:

When you choose a filter from the list it applies immediately to the document in the currently active window.

THE LEXICONS Introduction Most of the lexicons in NAWACOLEX 2.1 are in origin Nawat-Spanish or Spanish-Nawat

vocabularies that form a part of publications documenting the Nawat language. There are various partial exceptions: (1) BibLex, Hernandez and LBN are materials that have circulated informally during the last few years but have not been formally published yet; (2) BibLex, Campbell and LBN contain English as well as Spanish glosses, while the original edition of Schultze had German glosses but the glosses in this NAWACOLEX lexicon are Spanish “mnemonic” glosses as will be explained later.

Some of the publications from which these lexicons are taken contain a larger or smaller amount of other material besides the glossaries: (1) Arauz, Campbell and Schultze contain texts in Nawat, while BibLex is the glossary of a separately published text: the Nawat New Testament; (2) Campbell, Schultze and Todd contain descriptions of Nawat grammar; (3) Ramirez is unique in incorporating Nawat definitions or illustrative examples in every entry. Schultze also contains extensive commentaries on the texts. The glossaries in BibLex, Ramirez, LBN and Schultze are Nawat-Spanish (or Nawatother language) vocabularies. Campbell and Hernandez contain Nawat-Spanish and Spanish-Nawat vocabularies but in NAWACOLEX the Nawat index is always the one transcribed for the lexicons. Arauz and Todd are originally Spanish-Nawat vocabularies which have systematically been inverted in NAWACOLEX in order that all the lexicons should have entries with Nawat headwords. The sources that are not already in present-day Nawat spelling have been re-spelled in NAWACOLEX; this is necessary to make the composite database work efficiently. This respelling affects the Arauz, Campbell, Ramirez, Schultze and Todd lexicons. In NAWACOLEX all of these, with the exception of Ramirez, also note the original spellings in the source document, but it is the standardized spellings that are used in indexing (i.e. constitute the \lx fields). Other ways in which the lexicons have been given different treatments in being processed into NAWACOLEX lexicons were motivated by their different characteristics in the sources and the need to adjust them all to a sufficiently homogeneous format to permit them to be analysed in parallel within NAWACOLEX. Further details are given in the individual lexicon notes provided below.

The lexicons

Label

Arauz BibLex Campbell Hernandez Ramirez LBN Schultze Todd

72

Source Próspero Aráuz, El pipil de la región de los Itzalcos, pp. 155-266 Alan R. King, Lexical database of the Nawat translation of the New Testament Lyle Campbell, The Pipil language of El Salvador, pp. pp. 148-594 Werner Hernández, Nawat-Spanish vocabulary (unpublished manuscript) Genaro Ramírez Vásquez, Short Nawat vocabulary (unpublished manuscript) Alan R. King, Léxico básico náhuat (unpublished manuscript) Leonhard Schultze Jena, Indiana II. Mythen in der Muttersprache der Pipil von Izalco in El Salvador, pp. 291-360 Juan G. Todd, Notas del náhuat de Nahuizalco, pp. 45-141

In the above table, the column headed “label” shows the tagword used internally in NAWACOLEX and associated documents (such as this tutorial) to identify each lexicon informally. The following table summarises the above points, together with the number of entries in each lexicon: Arauz

BibLex

Campbell

Hernandez

Published? Source also has texts:

yes yes

no yes*

yes yes

no no

Ramirez no yes*

LBN no no

Schultze yes yes

Todd yes no

Source has grammar: Source glossaries:

no

no

yes

no*

no

no

yes

yes

SPNWT

NWTSP/EN

NWTSP/EN

NWT-SP, SP-NWT

NWTNWT/SP

NWTSP/EN

SPNWT

Used in NAWACOLEX:

SPNWT

NWTSP/EN

NWTSP/EN

NWT-SP

NWTNWT/SP

NWTSP/EN

NWT-GE, [NWTSP] NWT*

Glossary inverted: Re-spelled?

yes

no

no

no

no

no

no

yes

yes

no

yes

no

yes

no

yes

yes

Old spellings retained (\lz):

yes



yes



no



yes

yes

Nº of entries:**

998

2,423

3,108

2,534

101

574

978

1,049

SPNWT

*See below for explanations. **“Words” really means main entries, i.e. does not include sub-entries, which some lexicons make more use of than others, e.g. many words in LBN appear as a subentry of a more basic headword. Note: In mid-December, when NAWACOLEX 2.1 was about to “go to press” (in a manner of speaking) and all that remained to be done was to write the last few pages of this tutorial, Werner Hernández began to circulate a new, more complete edition of his dictionary which he now calls Ne Nawat Mujmusta. Unfortunately the news came too late to incorporate it into this edition of the database; perhaps it will be possible to make revisions in a later edition.

The lexicons

73

The structure and content of the NAWACOLEX lexicons To understand what is in each of these lexicons as represented within NAWACOLEX, something needs to be said first about the way the NAWACOLEX lexical databases are structured. Just as a traditional dictionary consists of a large number of entries each of which begins with a headword (and the headwords are arranged in alphabetical order), each lexicon likewise consists of a certain number of entries or (technically) records, each entry or record commences with a headword and the default order of the entries is in alphabetical order by the headword. Technically, the internal structure of any lexicon entry in Toolbox consists of a list of fields. Each field has two attributes: a field name which establishes the type of information, and the content which constitutes the data. So to take a simple example, the record for “ichkat” in the Hernandez lexicon consists of two fields, one called \lx (which stands for “LeXeme”), for the Nawat headword, and another called \gn (which stands for “Gloss in the National language”), for the Spanish translation: Hernandez lexicon, headword “ichkat”

\lx \gn

ichkat algodón

Lexeme Spanish meaning

The record contains what is inside the box. This is the internal structure of a lexicon record. The user interface of NAWACOLEX initially presents the information like this:

Here the structure is only implicit, so as to make the information easier to read by human users. It is possible to display the field labels by dragging the vertical separator towards the right:

The lexicons

74

Notice that the text formatting is designed to provide clues as to the nature of each line of data even without seeing the labels: headwords in green, Spanish glosses in purple and so on. Not all lexical records, or the records of all lexicons, contain exactly the same number fields. Indeed that is how the whole NAWACOLEX system (and the Toolbox architecture behind it) attains so much flexibility and usefulness: the structure can adapt to the data. Now let’s compare the Todd lexicon: Todd lexicon, headword “ichkat”

\lx \gn \lz

ichkat algodón ichcat

Lexeme Spanish meaning Source spelling of lexeme

And now the version in the Arauz lexicon: Arauz lexicon, headword “ichkat”

\lx \gn \lz \nt

ichkat algodón ichcat Arauz: m. Bot.

Lexeme Spanish meaning Source spelling of lexeme Note

The lexicons

75

The additional field seen this time, the “note” field, is a catch-all slot for adding supplementary information or observations. Here it is used to copy Arauz’s annotations in the Spanish-Nawat entry from which this record has been generated, which might not be terribly important in terms of what more it tells us, but has been included nonetheless. The meaning of “Arauz: m. Bot.” is: Arauz’s entry adds “m. Bot.”. It is stipulated in this way because sometimes the notes found represent not anything from the source document but my (ARK’s) observations. Some records are more complex than these: see for example the BibLex and Campbell entries under “ichkat”. For further details see the notes on the individual lexicons below. One field type in particular has a different status from the others in NAWACOLEX lexicons, and this is the \lx field. This is the field of an entry’s headword. Absolutely every entry must have a headword (i.e. an \lx) and can have only one. When NAWACOLEX loads files upon opening, whenever it encounters an \lx in a lexicon file it interprets this to mean “a new record (entry) starts here”. Therefore the \lx field always comes first among a lexicon record’s fields. The shape of the \lx field’s content is the identifier of that entry. NAWACOLEX can recognise entries in different lexicons which have the same headword by comparing their \lx fields; this serves many purposes but only can work, obviously, if the different lexicons use the same spellings for Nawat headwords. Hence the need for orthographic standardization for a system of this kind. Lexicons can contain different spellings in another field (\lz) but not in the field that is used for primary indexing purposes. There follows a list of many of the field types used in the NAWACOLEX for reference:

The lexicons

Tag \lx \cf \de \di \dn \dt \ee \en \et \ge \gn \hm \if

Field type Lexeme, i.e. Nawat headword Cross-reference Definition (English) Dialect Definition (Spanish) Date last edited Encyclopaedic information (in English) Encyclopaedic information (in Spanish) Etymological information, language of origin or base morphemes Gloss (English) Gloss (Spanish) Homonym number Inflected form

\in \lz \mn \nt \pdl \pdv \ph \ps \r \se \sn \st \va \xn \xv

Gloss (Spanish) of inflected form Lexeme (source spelling) Cross-reference to main entry Notes Paradigm label Paradigm (inflected) forms Phonetic form; frequency count Part of speech Corpus reference(s) Sub-entry Sense number Status Variant Spanish translation of example Example (in Nawat)

76

Comments primary reference field

+ \if-X for various specific inflections: see the next section

additional observations

for internal use

The lexicons

77

Adaptation and transcription of lexicon content The conversion of original documents, each with distinct features and layouts, into compatible NAWACOLEX lexicons is a process which aims to take advantage of the flexibility inherent in this form of organisation while being kept consistent and systematic enough to be amenable to various kinds of cross-lexicon analysis and interface. The NAWACOLEX lexicons are not 100% objective mirror-images of the sources, which, one might say, they attempt to reflect in spirit but not invariably to the letter, for the sake of ease and economy of use. Not only does spelling need to be standardized, but the organisation of the lexicon has to be unified too. The restructuring is most radical when we have had to convert a Spanish-Nawat glossary into a Nawat-Spanish one, as in the case of Arauz and Todd. Some detailed remarks along these lines will be made below. This process of standardisation across sources — standardising matters of form and appearance for the essential purpose of making any real differences of essence and substance evident, where they exist, while eliminating seeming contrasts when of no true import or consequence — brings to light the extent of the differences between how Nawat has been spelled by different authors in different works (and in the most extreme cases, even by the same author in different publications), usually with little or no regard for the practical need to achieve a common orthography if the language is ever going to serve as a functioning written medium (which, of course, was not even the intention of most of the authors involved). Without The following table sums up some of the most notable differences, and as examples shows how each source spells the words “takat”, “siwat”, “kunet”, “kwilin”, “tzajtzi”, “chiltik”, “ashan”, “naja”: Source

Spelling rules

Examples

Arauz

Imitates Spanish; uses hu for “w”, x for “sh”; t’ , c’ for final “t”, “k”; ^ on stressed vowels.

tâcat’, cîhuat’, cûnet’, cuîlin, tzâjci, chîltic’, âxan, naja

BibLex

IRIN spelling.

Campbell

Phonemic & morphemic; ts for “tz”, x for “sh”, h for “j”; original long vowels with :.

takat, siwat, kunet, kwilin, tzajtzi, chiltik, ashan, naja ta:ka-t, siwa:-t, kune:-t, kwilin, tsahtsi, chi:l-tik, a:xa:n, naha takat, siwat, kunet, kwilin, tzajtzi, chiltik, ashan, naja takat, siwat, kunet, chiltik, ashan, naja tágat, síuat, kúnet, kuíliņ, tsáχtsi, číltik, ášaņ, náχa tácat, cíhguat, cúnet, cuílin, tzatzi, chíltic, áshan, naja

Hernandez IRIN spelling. LBN

IRIN spelling.

Schultze

g for some “k”s, u for “w”, ts for “tz”, č for “ch”, š for “sh”, χ for “j”, ņ for *ŋ+; ´ on stressed vowels. Imitates Spanish & partially phonetic; sometimes hg for “k”, hgu for “w”, sometimes hz for “s”; ´ on stressed vowels.

Todd

“IRIN spelling” refers to the modern system pioneered in IRIN’s publications and increasingly used everywhere. For convenience, the other systems are described in this table in terms of departures from IRIN spelling. Ramirez’s spellings are not included above: he varies somewhat randomly between imitation of Spanish and IRIN usage (his

The lexicons

78

opinions were taken into account in the process of the standardization that led to IRIN spelling). The spelling adopted for headwords in NAWACOLEX is the IRIN system.

The lexicons

79

Treatment of verb inflection in the lexicons An extra set of fields are often included in Schultze, and sometimes also in Arauz, for cited inflected forms (generally of verbs): if-dir if-em if-fut if-inac if-ir if-iter if-pf

Directional ([w]al-) Emphatic (-a) Future (yu- type) Inaccusative Irrealis Iterative (redup.) Perfect

if-pp if-prefix if-pt if-rfl if-san if-tuya if-uk

Participle Prefixed form Preterite Reflexive (mu-) Clitic -san Tense in -tuya Clitic -uk

There is also a generic \if field type used for other inflected forms, including plural and possessed forms of nouns. In verb entries, these (when present) are generally given in the following fixed order: prefix, ir, pt, pf, tuya, pp, fut, inac, rfl, iter, em, uk, san, dir. Advanced Nawat students will rarely if ever need to actually see the labels to recognise which forms are which, as it is nearly always obvious from the form itself which form it is. Thus it is not usually necessary in practice to display the tags, and all you see is a list of forms such as this:

which, if the labels are displayed, is as follows:

The lexicons

80

Both Arauz and Schultze are more like glossaries of textual (often inflected) forms rather than proper vocabularies. For the present purpose it is vocabularies that we want, but the inflected forms listed in the sources contain information which it seemed unfortunate to have to ignore, so an attempt was made to note, under lexical entries, the cited inflected forms to the extent that this was practical and the information did not seem completely redundant. As a rule the forms given in these fields in NAWACOLEX are “abstract” or shorthand representations of sets of inflected forms, not literal repetitions of every single form listed by Arauz and Schultze. For example, where in the Arauz lexicon the NAWACOLEX record gives “munapalua” in the \if-rfl field of “napalua”, this is not to be taken necessarily to mean that Arauz’s book contains the specific form munapalua, but that such a form is to be inferred from something that does occur there, such as ximunapalûcan for example. What this actually tells us of interest is not the form of the reflexive of -napalua (predictable in this instance), but rather the fact that a reflexive form of this verb is attested. Detailed notes on the inflected form fields follow: \if-prefix: In this field indications or examples are given relevant to how person (subject, object) prefixes are affixed and which such forms are attested. The information might serve to establish: (a) whether the verb is shown to be intransitive or transitive or both (which can be deduced from which prefixes are shown); (b) the forms of these prefixes or of the prefixed stems (usually predictable but occasionally worth noting); (c) sometimes, other particular phonetic behaviour. For example:

\if-ir: The term irrealis is taken to cover the imperative (e.g. shiktali, -kan), subjunctive (ma kitali, -kan), synthetic future (kitalis, -ket) and conditional (kitaliskia, -t). These tenses almost always share the same stem, even amidst the chaos sometimes seen in other tense forms. Thus if we know that the irrealis stem of -talia is “tali”, that covers the morphology of any and all of those tenses (only yawi and witz need special

The lexicons

81

treatment). The different tense forms are often still retained in the NAWACOLEX lexicon, but this has been the rationale, together with a desire for brevity, behind gathering the different irrealis forms into a single field. But it is also interesting to know which tense forms of which verbs are actually attested, so specific examples are sometimes included in the lexicon where potentially pertinent. Example:

\if-pt: Past or preterite tense formation is notoriously variable in all Nawat dialects with regard to suffix morphology. The indications in this field aim to sum up which variants are attested (or implied) by the forms cited by Aráuz. There is considerable free variation even within single documents, and this is indicated by the symbol “/”, e.g.

The lexicons

82

\if-pf: “Perfect” usually refers to the perfect tense, but this field also sometimes includes the perfect conditional in -tuskia. In principle it is kept distinct from the participle, \if-pp, since the forms do not always coincide. \if-tuya: I believe there are two different tenses in -tuya, as will be described in the Grammar: one for stative verbs, where its meaning is imperfect, the other for dynamic verbs, where it is pluperfect. In order not to prejudge this, however, I have put all tense forms ending in -tuya(t) in a single data field. Also not to prejudge the spiny question of whether -tuya inflections are added to the present or the perfect stem, I have always listed -tuya forms here, rather than in the \if-pf field. \if-pp: The field for participial forms (inasmuch as these can be identified), as opposed to the perfect tense (see \if-pf). \if-fut: This field is not for the synthetic future, which is included in \if-ir, but for the periphrastic future (with yawi) which is often contracted in these sources, which spell them as if they were synthetic forms (which they are plainly not); see the previous illustration (“talia”). \if-inac: The inaccusative form, which is mostly formed with ta-. The form cited in the lexicon may be inferred from inflected forms rather than actually cited in the source, since the purpose of this field is to establish whether any inaccusative forms are attested and, if so, the form of the inaccusative stem:

\if-rfl: Except for a small number of minor exceptions (of two kinds: mu-V  m-V as in m-altia and mu-i  mu- as in mu-(i)shmati), reflexives are formed straightforwardly by

The lexicons

83

prefixing mu- to transitive verb stems. This field records the attested reflexives, noting simultaneously their existence and form. \if-iter: The iterative of a verb is regularly formed by reduplication with j. In this field we note iteratives attested in the lexicons. \if-em, \if-uk, \if-san: “Emphatic” (\if-em) is the name given to forms taking the clitic -a or -ya which signifies emphasis or the idea of ‘already’; in this case inflected forms are registered in full so that we can see which forms exactly occur with the emphatic:

The occurrences of the clitics -uk / -yuk and -san are also recorded in special fields:

\if-dir: The \if-dir (“Directional”) field displays variants of verbs with the directional prefix (w)al- if any are attested:

The lexicons

84

The lexicons

85

Arauz Próspero Aráuz wrote El pipil de la región de los Itzalcos as an elementary school textbook, though it was not used as such as far as we know and was not published (edited by Pedro Geoffroy Rivas) until many years later. The book consisted of various parts; while these are all of interest, the most important in general as documentation of the Nawat language are the readings (included in the text corpus of NAWACOLEX 2.1) and the Spanish-to-Nawat glossary at the end of the book, and it is the latter that serves as the basis for the Arauz lexicon. The text presents numerous difficulties and contains many obscure points. I have not seen much information at all bearing on how Próspero Aráuz went about compiling the materials in his book, beyond the few things Geoffroy Rivas says about it in his preface to the published edition. In many ways we may say that the material has been produced very inexpertly from a linguistic point of view, yet it is still a very valuable document because of its size, variety of content, and the fact that it is one of the earliest extensive documents of any variety of the Nawat language, and it apparently reflects the usage of Nahuizalco in particular, of which the only other substantial attestation, despite the great vitality of Nawat in times past, is Juan Todd’s short book, written some decades later than Arauz’s and much more limited in content and scope. The Nawat in Arauz is poorly and unreliably recorded in an ill-adapted orthography, probably compounded by some errors of transcription and poor editing. On the basis of the material itself, i.e. both the Nawat texts and the glossary, I am forced to infer that Aráuz himself did not know a great deal of Nawat and could not speak it properly. The texts must have been obtained from other people who would have been native speakers and probably illiterate. Thus, words spoken by native informants were written down by somebody whose knowledge and understanding of the language was incomplete. The results obtained through this exercise are roughly of the quality one might expect them to have under such circumstances. What was written down, whether right or wrong, was probably never proofread or corrected by a Nawat speaker, and so there are things that don’t seem to make sense and their most likely explanation is, simply, that indeed they do not. However, there is enough substance to this corpus for it probably to be possible, through a painstaking critical study of texts and glossary by a well-prepared scholar, to reconstruct a better picture of the language than this material superficially provides. The book’s editor, Geoffroy Rivas, seems not to have been adequately equipped for the task; his hand seldom improves the text and sometimes we might speculate that it has even obscured it further. Future scholars will have plenty of unravelling to do here. If Arauz was no Nawat speaker, he certainly was no linguist either. As a matter of fact, his glossary on which our lexicon is based is not any sort of proper vocabulary; rather, it is an erratic and highly chaotic list of word forms. The order is Spanish to Nawat. The feature of the glosssary that strikes an educated reader immediately is precisely its lack of linguistic sophistication. This is seen most easily in the way it treats verbs (which are the heart and soul of the Nawat language, more so than of the Spanish). The list of headwords on the first page of the glossary starts out like this:

The lexicons

           

a. Prefijo en voces pipil. Prep. abajo. adv. de l. abejón. m. Zool. abra. (Ud.) abrace (él o ella) “ (yo) abracé (yo) abracemos (nosotros o nosotras) abracen (ellos o ellas) abrácense (ellos o ellas) “ (ustedes) abraza (él o ella)

86

 “ (tú)  abrazado, da  abrazáis (vos o.)  abrazamos (nos o.)  abrazan (ellos o ellas)  abrazándolo, la. form. comp.  abrazándose, form. comp.  ABRAZAR, a  abrazara (él o ella)  “ (Ud.)  “ (yo) (etc.)

There follow 110 more pages of the same kind of thing, laboriously working its way through the Spanish alphabet from A to Z. The Nawat glosses supplied for each of these word forms in turn sometimes are, or could be seen as, accurate translations of the Spanish words, other times not really, and then we are left to try to guess “what happened” in the process Aráuz must have put his informants through to arrive at these lists. Sometimes the Nawat doesn’t really mean anything, it is perhaps an incomplete fragment of an inflected form or something misheard; sometimes it means something completely different and one can guess what sort of misunderstanding must have occurred. It must also be understood that some of the Spanish words listed cannot be adequately translated as they have no translation, but an attempt has nonetheless been made to supply one. For instance, in the above examples consider the infinitive “abrazar” (fourth from the end); now let us remember that Nawat possesses no infinitives. In this particular case, the gloss given is “napalua” which is the stem of the Nawat verb. That would be fine if it was done consistently but it is not: for “acurrucar r[efl].” the gloss given is “cuyuluj” which is a past tense form bereft of the necessary prefixes (whereas “acurrucando ger.” is glossed as “quicuyulua”, i.e. “kikuyulua”, a transitive present tense form). For “afilar” the gloss given is “tenican”, which is a subjunctive form, for plural subjects, of the verb “tenia”, again bereft of the prefixes which alone allow the form to be made sense of. For “agregar” two glosses are given, “ticentalia” and “centalij”. The former is a present tense form in the second person singular but it lacks the obligatory object prefix, which if present would produce the meaning “you put it together”. The latter is the past tense, again lacking the prefixes it must have to become a proper word. This material has had to be thorougly reworked to produce a usable lexicon organised on similar enough principles to other vocabularies for it to be possible to integrate it into NAWACOLEX. Long lists of verb forms, such as that partially copied for “abrazar” in the above example, had to be analysed and reduced to one or more lexical verbs, each of which was to be made a Nawat headword, with glosses based on the Spanish forms cited in the original. Additionally, an attempt was made to reduce the inflected forms cited throughout the source document to some sort of order and sum up the essential morphological information this contains in a suitable format in the new lexicon entries. The many difficulties in these procedures owing to inconsistencies and errors in the

The lexicons

87

source have been addressed as best they could. Of course the spellings needed to be radically changed too, and sometimes erroneous word divisions corrected! The end result is illustrated here:

With field names:

The headword in \lx is of course in IRIN spelling and in the citation form for verbs used in NAWACOLEX. The glosses in the \gn field reflect Arauz’s information, although this has had to be condensed and “rationalized” here: for example, the source glossary contains the long list of inflections of the verb “abrazar” some of which were listed above (abrace, abracé, abracemos, abracen, abrácense, abraza and so on), all summed up here in the one infinitive “abrazar”, following general dictionary-making practices. Next comes the group of fields for inflected forms whose names begin with \if- (see above). Arauz’s spelling of the headword is presented in the \lz field. Notice that Arauz’s original glossary has two entries for “abrazo”:  abrazo. m.

The lexicons

88

 abrazo (yo) He translates the noun as “napalûa” and the verb by “nicnapalûa (naja)”. The latter is correct, but the former is nonsense because “napalua” is not a substantive. However, if one confronted a normal Nawat speaker with the question, “¿Cómo se dice (un) abrazo en náhuat?”, the speaker would not be able to answer with a substantive because there isn’t any, and so would have to respond with the verb, namely either -napalua or one of its fully inflected forms (e.g. timunapaluat, nimetznapaluskia etc.). In a sense then, it is true to say that the only way to translate the Spanish noun “abrazo” into náhuat is using the verb -napalua; but -napalua is still a verb and so, strictly speaking, does not literally gloss “abrazo m.” qua noun. Because Arauz glosses the noun “abrazo” with “napalua”, this fact is reflected in our lexicon, but the note “verbal” serves to remind us that, nonetheles, “napalua” is not a noun but a verb.

The lexicons

89

BibLex BibLex is the name given to a complete lexical database of the text of the Nawat New Testament. It documents the vocabulary used in that text corpus exactly and comprehensively. Thus this is not in truth the same kind of document as most of the other lexical resources included. Scholars using these resources for serious research purposes will wish to take into account the nature of each of the sources and evaluate their content according to their particular needs and criteria. As long as that is clear, I believe there is no harm in including this resource in the package. Entries are highly structured, with a lot of sub-entries, as a result of which the number of main entries is considerably lower than would have been the case if this lexicon had been organised like some of the others, with mainly top-level entries and few or no sub-entries. This lexicon is special in that (a) it is a description of a very specific text corpus, and (b) it documents all inflected forms as well as the lexemes. With regard to the latter point, BibLex is structured differently from Arauz and Schultze. The BibLex lexicon does not aim just to inform us about the general formation of inflected forms but to stipulate which forms, exactly, occur. BibLex also contains book-chapter-and-verse references to occurences or words and inflected forms in the text (but not always all such references are given) and frequency counts for lexemes. The format it uses for this is different from that of Arauz and Schultze. BibLex was developed for use in a different context and was inserted into NAWACOLEX as an afterthought, unmodified, just in case it is found useful to have all the lexicons sharing one platform where they can be compared with ease.

The lexicons

90

Campbell To any serious student of Nawat today, Lyle Campbell’s monumental work should need little introduction. Published in 1985 and based on research carried out before the outbreak of the Civil War, the book consists of several parts: an extended grammatical sketch, Nawat-Spanish-English and Spanish-Nawat vocabularies and texts, as well as substantial introductions and appendices. The longest part of the book, the NawatSpanish-English vocabulary, is the basis of the Campbell lexicon in NAWACOLEX. The source material has been subjected to a minimum of manipulation beyond what is strictly necessary to adapt it to the formatting requirements of the database and the way it functions. It was of course necessary to organise the text into data fields and also to re-spell the headwords in the \lx field; Campbell’s original spelling is conserved in the supplementary \lz field and has not been altered in some of the subordinate fields. Because the original “information structure” of the source entries has largely been maintained, the resulting fields depart somewhat from the pattern in other lexicons, especially since Campbell’s data is somewhat more complex, structurally (i.e. more complete) than the more skeletal head-word-plus-gloss structure of some vocabularies. Another reason should be mentioned for any divergences: Campbell’s vocabulary was the first lexicon to be incorporated into the NAWACOLEX database when it was first decided to change platforms and adopt this integrated structure; it was, so to speak, the first experiment, and in hindsight there may have been a bit of overkill in the “meticulousness” of the field structure adopted. This was natural at that stage, when it was difficult to be certain how the whole thing was going to work andit was felt to be preferable to conserve too much information than not enough! Let’s look at an example:

The lexicons

91

\di: Most Nawat lexicons either reflect a single geographical variety of Nawat, so that the information they provide can be taken as referring to that dialect, or else purport to represent a general, pan-dialectal concept of the language, in which case no conclusions are to be drawn in this respect. Campbell is unusual in that it is based on fieldwork in more than one location and each word is associated with a particular dialect, even to the point of repeating similar information for different dialects where appropriate. Campbell’s data comes chiefly from two localities: Cuisnahuat (indicated by “C” in our source) and Santo Domingo de Guzmán (“SD” in Campbell). This valuable information has of course been conserved, but I replaced “C” and “SD” with “Q” and “W” (the latter from the Nawat name Witzapan) since I had already come to employ the latter abbreviations in other Nawat work prior to that and because I thought I might want to employ the same abbreviations throughout the corpus although, largely for want of sufficient time, for now that option has not been implemented, since as I have said, the dialect of other sources is either clear once the source is identified or else cannot be ascertained. \so: Upon commencing this project I was anxious to make sure that the source of information was clearly identified so I inserted a \so field under every headword (and even under sub-entries): “LCD” means “Lyle Campbell, Dictionary”, i.e. the source is the dictionary section of LC’s book. In reality this information proves largely redundant because the name and NAWACOLEX “handle” of the document in which the information is contained identifies the source adequately without resorting to this field, as we have seen, and so the other lexicons contain no \so fields. Consequently, this field can now simply be ignored as superfluous. \lz: As in several other lexicons, the source spellings of headwords are given in the \lz field. \ps: Campbell sometimes indicates part of speech; his indications are reflected in this field. \po, \pl: Campbell supplies inflected forms of nouns (possessed, plural) and “principal parts” of verbs. There are several options regarding how to encode this information in a lexicon, and I was not consistent, using one method for noun inflections and a different one for verbs; this may be inelegant but for the time being it still does the job! Noun inflections are tagged in our lexicon through specially-named fields: \pl for Plural, \po for Possessed form. \pdv, \pdl: For the principal parts of verbs, a different format was used, where each item is represented by a set of two fields, one to hold the name of the identifier (e.g. “pres.”, “pret.”, “perf.”) and the other to hold the Nawat realization of this form of the verb. These fields are \pdl (ParaDigm Label) and \pdv (ParaDigm form [in the Vernacular+). I think the ideal “syntax” convention would be the order “label, form”, but the source material presented the items in the opposite order and for technical reasons it would have presented a challenge to generate this order automatically across the Campbell lexicon, while doing this manually would have cost more time

The lexicons

92

than could be spent on it, so I have settled for the order “form, label” as better than nothing in the case of this particular lexicon:

The lexicons

93

Hernandez In recent years Werner Hernández has circulated a Nawat-Spanish, Spanish-Nawat vocabulary which has been found immensely useful by the newest generation of Nawat students, who now at last have something like a simple dictionary that is correct, broad is scope, and up to date in content and in spelling. If Hernandez is better called a vocabulary than a dictionary it is because it lacks some of the trappings of better dictionaries such as basic grammatical information (e.g. part of speech), not to mention systematic sub-entries or examples; but it is a firm step in the right direction. I know through personal acquaintance that Werner has probably performed more Nawat language fieldwork (or the equivalent of fieldwork) than anyone else over the past dozen years, and I can imagine that his field notes must have informed the content of this work on many occasions, but there is no explicit indication in the vocabulary about where his material comes from. Clearly, however, the author has drawn heavily on the earlier sources that are included in NAWACOLEX as is evident from even a hasty perusal and comparison.

The lexicons

94

LBN LBN (Léxico Básico Náhuat) is the NAWACOLEX version of a vocabulary that I first compiled and circulated among a few people in 2004, as a printout of a Microsoft Word document titled Léxico del Náhuat Básico. It was developed not with the idea of representing a complete dictionary but as a step along the way, and a long way at that, towards the ultimate goal of eventually providing Nawat with a proper functioning lexicon at some time in the future. The focus was on establishing a basis for that endeavour, a starting point, not an arrival point. In contrast to the wider ranges of some lexicons (Arauz and Schultze each have close to a thousand main entries, BibLex and Hernández around 2,500 each while Campbell boasts more than 3,000), LBN’s entries number under 600, although if we were to count its numerous sub-entries the figure might be closer to a thousand. This was the result of two decisions: only to take into account the most frequently ocurring words, and to organise these into word families so that derivationally related words are for the most part grouped under a single headword. The basis for determining which words are the most frequently occurring was a text corpus which I had by then compiled and which is essentially the predecessor of the text corpus of NAWACOLEX (the corpus’ size has tripled from around 20,000 then to 60,000 now). The payoff from choosing to limit the size of the lexicon was the amount of detailed lexical information contained in its entries. Most of that information came again from the existing corpus: LBN is the first Nawat corpus-based lexicon. In any text corpus, the words it contains vary widely in their frequencies of occurrence, with many only appearing once, twice, four or five times, while a few occur dozens or hundreds of times. Thus in any given corpus some words will occur too few times to provide, on the basis of those occurrences, full information about those words on that basis alone. To achieve a consistently high level of quality, completeness, detail and empiricism in describing a language’s words, then, words with frequencies below a certain threshold in our corpus must be excluded from our description. LBN did not aspire to exhaustiveness because it aspired to fullness and reliability regarding the words it includes — at least respecting the corpus then available. Once more, let the entry for “siwat” serve as an example:

The lexicons

95

“Siwat” is the headword, while the derived words “siwapil” and “siwapiltzin” appear as sub-entries. Variants are enumerated and part of speech is specified. Senses are distinguished and numbered. Each sense is glossed (in Spanish and English) and corpus-based examples provided and translated, particularly those deemed useful for establishing a word’s semantic range, syntactic usage or collocational potential. Inflections are laid out where available and labelled (here, the text corpus data are supplemented from some other sources, particularly Lyle Campbell’s opus). Variant forms that are not recommended for the standard written norm are still cited but asterisked. (The date stamp at the end of the entry does not refer to when this material was composed, which was some four years earlier, but when it was converted to the present Toolbox format for integration into NAWACOLEX and the production of an electronic version, on the LexiquePro platform, which has been publicly available for download from my website since then.) Or again, take the entry for the verb “-talia” (below). The reflexive “mutalia” is given as a sub-entry. The same features are all observed: part of speech indications, numbered senses, glosses in two languages, copious corpus-based examples with their translations, inflectional paradigms including attested variants:

The lexicons

96

The lexicons

97

The lexicons

98

Ramirez Genaro Ramírez is well known in the Nawat language community for his pro-Nawat activism, teaching, speaking and writing. He is the most prolific Nawat writer of his (last??) generation of native-Nawat-speakers, in fact we might as well say the only one. Some of his writings have not been published but have been circulated informally among the (sadly few, it seems) interested people until now. Among them, this brief glossary of about a hundred Nawat words accompanied by Spanish translations and, uniquely among the lexicons so far in existence and most interestingly, definitions, explanations or examples in Nawat, e.g.

This glossary was transcribed and incorporated early on in the development of the Nawat corpus. Given the scarcity of Nawat continuous text at the time, he defining sentences were an interesting addition to the text corpus. This modest yet valuable work may perhaps be seen as an early prototype for a future Nawat monolingual dictionary — a project that still lies well in the future, even for more “advanced” Nawat specialists.

The lexicons

99

Schultze Leonhard Schultze Jena’s Mythen in der Muttersprache der Pipil von Izalco, published in Germany in 1935, dwarfs all other twentieth-century texts in Nawat in both size and richness of content. Besides providing the original Nawat text, a German translation and copious ethnological notes, Schultze’s book contains two voluminous sections: a Nawat grammar and a lexicon or glossary. Unfortunately, both the latter are plagued with misconceptions, obscure notions, errors and confusions, which are sometimes compounded by the failings of the Spanish translation of the entire work, in two volumes, that was published in El Salvador over forty years later. Yet despite its many defects, the Schultze glossary is still a treasure trove of lexical material that no advanced student can afford to overlook. Like the Arauz lexicon, that of Schultze is really a glossary which purports to list all the word forms, complete with inflections, that occur in the text corpus that is his book’s real centrepiece. We saw above that Aráuz’s vocabulary, which is Spanish-Nawat, is full of headwords such as abrace, abracé, abracemos, abracen, abrácense etc. ad infinitum, instead of just giving the infinitive “abrazar” as we would expect in a proper vocabulary or dictionary. Schultze’s vocabulary does the same sort of thing, but in the order Nawat-German, so we have long lists of forms such as ankichishket, ankichiwat, ankichiwtiwit, ankikwat, ankilwitiliat, ankimakat, ankimatit, ankinekit, ankintajtaniliat, ankipiasket, ankipiat, ankitaket, ankitaskiat, ankitata and so on. One difficulty with this is that it makes it very laborious to actually find something if you are trying to use it as a general Nawat vocabulary, which is what we would really like to have! Mixed in among these surface forms he also intersperses what he takes to be the “roots” of Nawat verbs, and which are as often as not a figment of his imagination, and in any case, even if correct roots, are not always correct Nawat words: for example, *a ‘water’ (cf. at, ati), *ajw ‘to water’ (cf. ajwi, -ajwilia), *al ‘bathe’ (cf. -altia, maltia), *as ‘find, arrive’ (cf. a(j)si, -a(j)si), *ets ‘set up, etc.’ (cf. -ketza, muketza, muetztuk, taketza, -kejketza, *il ‘say’ (cf. -il(w)ia), *ilp ‘tie’ (cf. -ilpia), *inay ‘hide’ (cf. -inaya, minaya), *itsk ‘catch’ (cf. -itzkia), and so on, together with putative root combinations which he sees as underlying other real words, e.g. *aj-kaw ‘leave’, *a-pa-chu ‘put in water’, *elnamig ‘remember’, *ich-teg ‘steal’, *ij-kwan ‘move away’, *ish-pelu ‘observe’, *ish-pen ‘pick’. His list also contains some actual lexemes, particularly nominals, which he leaves unanalysed. Small wonder that all that the most determined would-be Nawat students of years past, confronted with this and his equally confusing grammar, have made little headway, which might have contributed to the damaging and quite mistaken rumour that Nawat is an impossibly difficult language to learn! This source material, then, needs to be thoroughly reworked to integrate the content into NAWACOLEX in a useful way, and a not insignificant task it is! A whole new set of headwords had to be posited (cf. Arauz) and the lexical content in the source document completely redistributed, restructured or reclassified to fit it into a framework compatible with the rest of our material.

The lexicons

100

Another challenge is presented by the glossary’s often lengthy and information-rich German glosses, which while very interesting indeed are a problem for two reasons: because of their length, and because they’re in German (given that the most usual common glossing language in other lexical sources, and in NAWACOLEX, is Spanish). An experiment showed (see our entry for “ajkawa”, replicated below), the amount of work required to copy the full set of German glosses for everything would have made the project quite impractical at the present time given the resources available:

No other Nawat vocabulary has anything near such extensive glosses, and there can be no doubt that this information, based on Schultze’s fieldwork and his understanding of the texts he collected, offers very valuable input for future generations of Nawat dictionary makers. This material should be mined and its ores incorporated into our body of knowledge of Nawat, but manually copying these lengthy glosses into NAWACOLEX would not, really, make the information much more accessible to the German-reading scholars who will have to study it, provided they can find a copy of the original. Then there are the Spanish translations in the El Salvadorean edition, just as long, and less reliable; if we’re going to make all the effort to do this storehouse justice it makes little sense to rely on iffy translations of the glosses when the originals are accessible if we want them. So it was decided to refrain from attempting to transcribe Schultze’s glosses in the NAWACOLEX lexicon. The \gn field was not simply omitted, however; instead, it is filled

The lexicons

101

with a short Spanish gloss just sufficient to clarify the word’s general meaning but not meant as a substitute for the Schultze glosses. The NAWACOLEX Schultze lexicon, can be used as an aid, and in particular as an index which provides information about what Nawat words are in the glossary (and the corpus on which it is based), how they are inflected and their morphology, how Schultze analysed them (no matter how wrongly at times!) and where the entries containing these Nawat words are located in Schultze’s glossary. The simplest type of entry in the Schultze lexicon in NAWACOLEX looks like this:

This tells us that: (a) the Nawat word apan occurs in the Schultze glossary; (b) it is given in the glossary in an entry the headword of which is ápan (actually ápaņ but the transcriptions are somewhat simplified in NAWACOLEX owing to font limitations); (c) we are reminded that this word means (roughly) “río” although that is not necessarily Schultze’s gloss. His gloss is, in fact, as follows: “Wasserlauf, Fluß; Teich, Quelltümpel; Bewässerungsgraben”. Or as the translators render this: “río, cauce de agua, estanque; canal de irrigación”. (The translators saw fit to omit “Quelltümpel”, possibly because they did not know how to translate it. They also substituted a comma for a semicolon, perhaps through sloppiness rather than for any sensible reason.) Now let us look at a more complex entry:

The lexicons

102

The notes about how the verb is inflected are all based on forms that appear in the glossary. They occur at many different places in the source glossary and it would be quite impractical to specify where, so this is not attempted. Schultze, following his odd criteria, believes he can trace the stem of this verb to a root *tal. The \gn field here says “poner” as a mere mnemonic as to which verb we are talking about and its general sense. Schultze’s actual gloss for the verb in the relevant parts of his entry “tal” is: “legen, setzen, hinstellen, speichern, anstellen, in Dienst stellen, zur Hand nehmen, (Wasser) zuleiten, von Pflanzen: (Frucht) ansetzen”. Some entries in this lexicon in particular contain notes (in an \nt field) by A.R. King to clarify and sometimes to correct Schultze’s analyses or point out significant errors in the Spanish translation, because such annotations seem necessary in this compicated material to minimize some readers’ misapprehensions or confusion. These notes are written to myself really and should not be judged as a definitive statement on anything, but since I am sharing my material with fellow students and scholars I see no need to hide my extemporaneous thoughts from them.

The lexicons

103

Todd Juan G. Todd’s little book titled Notas del náhuat de Nahuizalco is made up of a 40page “grammar”, which contributes little new to our knowledge of the subject, followed by a 100-page Spanish-Nawat vocabulary of more interest, although part of it appears to have been taken from an earlier source such as Arauz. Some of the material may have been obtained independently. A thorough study of the texts might turn up some more insights on this issue. In the meantime, Todd’s lexicon, while not earthshakingly important, is worthy of study nonetheless. The Todd vocabulary is open to criticism as an unsophisticated, overly-simple word list, but having said that a laudable thing about it is that it is relatively error-free. The poor spelling practices (by modern standards) are a limitation that make the original publication less recommendable to elementary Nawat students than might otherwise have been the case. The Todd lexicon in NAWACOLEX consists of entries on a simple plan which hardly varies, e.g.

A less slightly simple example follows:

Todd and Arauz are the only sources which give the meaning of -suma as “quejar(se)” or “regañar”, its normal meaning being “pelear”. This circumstance is open to two different conjectures. Since both of these sources reflect Nahuizalco usage, this might be a case of local semantic shift. Alternatively, however, it could have been that Aráuz

The lexicons

104

got it wrong, and Todd borrowed from Aráuz; if that was the case, the coincidence only reflects the propagation of a single author’s mistake rather. This illustrates why it is so desirable to know about the sources used by compilers of lexicons from the point of view of subsequent analysis. In the lexicon entry I have placed Todd’s glosses in the \gn field and used an added \nt field to annotate an alternative gloss based on what we know from other sources. “Tasuma” and “musuma”, listed separately by Todd, are treated as sub-entries in NAWACOLEX. Notice that the general meaning of -suma surfaces in the gloss for the latter.

THE TEXTS Introduction Some of the Nawat texts in the NAWACOLEX 2.1 corpus, including most of the most important ones, form just one part of bigger works about Nawat. Such works were written to inform readers about the language in general. Schultze (where the texts are found which we call the Masin corpus) and Campbell primarily address a specialist and academic readership, and their texts are direct transcripts of material narrated spontaneously by a variety of native speakers. Arauz is oriented to a less specialized audience, and the way the texts were obtained is not entirely clear but was probably much less rigorous, i.e. they may well have been elicited through translation, always a dubious methodology. The fact is that the linguistic quality of the Arauz texts is less consistent than in the case of Schultze and Campbell, whateverthe reason, and that should be borne in mind. Not all texts in a language are of equal quality as linguistic documentation! These works were all written and published in the twentieth century (Arauz and Schultze were written in the first half of the new century, Campbell in the second, but Arauz was not published until 1960). Two other texts included in the NAWACOLEX 2.1 corpus, Derechos and Genaro, were published and circulated around the turn of the century, the former in 1997 and the latter in 2004. The way in which the two came about is rather different: the first Nawat translation of the Declaration of Human Rights was dictated by Genaro Ramírez and Paula López, both of Santo Domingo de Guzmán, and edited by Jorge Lemus; whereas the Nawat content of Naja Ni Genaro is an original text, not a translation, written personally by Genaro Ramírez and edited by Alan King. The quality of these two texts is very different too: translating the DHR was far too ambitious a proposal at the time it was made which reveals a great insensitivity to the situation of the language and the skills acquired by the informants; predictably, the text that resulted is highly flawed, sometimes to the point of incomprehensibility, and whatever editing was done did not address those issues successfully. In Genaro, on the contrary, the speaker had a free hand to express himself as he wished in his own language; the editing could thus be limited to standardizing the spelling and gently cleaning up the grammar without meddling excessively with the essence of the text, and the end-result is perfectly readable and, unlike the DHR translation, constitutes a genuine portrayal of one native speaker’s idiolect and voice. Together, these five texts make up the Nawat Published Texts section of the NAWACOLEX 2.1 corpus (coded NPT). Other materials that do not come from books or publications but have circulated informally and been of use in the early stages of the present Nawat recovery process (i.e. in the first decade of the new millennium) are in the Nawat Miscellaneous Texts (NMT) section of NAWACOLEX 2.1. Two small bodies of such material are included in this corpus: Ramirez consists of short essays written by the aforementioned Genaro Ramírez, while Lopez is a small set of oral texts collected from Paula Lopez by Jorge Lemus (I believe).

The texts

106

Lastly the NAWACOLEX 2.1 corpus also includes transcriptions of some of the nativespeaker Nawat interviews that were produced by IRIN in the first decade of the present century. They make up the Nawat Documentation Texts (IRIN) section of the corpus (abbreviated to NDTI). Label

Masin Arauz Campbell Derechos Genaro Ramirez Lopez NDTI transcripts

Source Leonhard Schultze Jena, Indiana II. Mythen in der Muttersprache der Pipil von Izalco in El Salvador, pp. 291-360 Próspero Aráuz, El pipil de la región de los Itzalcos, pp. 155-266 Lyle Campbell, The Pipil language of El Salvador, pp. pp. 148-594 UNO, “Munextia muchi ipal ne tehtechan tay tupal.” Genaro Ramírez Vásquez, Naja Ni Genaro. [untitled typescript] [untitled texts] IRIN, Language documentation project.

The text corpus of NAWACOLEX aims to present authentic Nawat texts for advanced study and research. Giving translations of those texts does not fall within its objective. In the source documents, Spanish, English or German translations are found in all cases except Lopez, some of the passages in Arauz and some of the IRIN transcripts. Generally those translations have not been transferred to NAWACOLEX because that is not its purpose, but also because it would have taken time and a lot of work to do so and there were certainly other priorities. The exceptions are the IRIN transcripts, a few of which show translations since those texts (with the translations) were generated in Toolbox ab initio so the work was already done and it would have taken more work to delete the translations! Given that the texts of the NAWACOLEX 2.1 corpus are all (save Genaro) in some sense oral texts produced by native speakers and (apart for Genaro and the IRIN transcripts) were written down prior to the beginning of any written standardization, each text reflects and may be considered a specimen of the Nawat dialect spoken by the person from whom it originates. Therefore they can be characterized and classified by dialect, as follows: Text Masin Arauz Campbell (Q) Campbell (W) Derechos Genaro Ramirez Lopez IRIN 2, 3, 4, 12 IRIN 5 IRIN 6 IRIN 7, 9

Dialect Izalco Nahuizalco Cuisnahuat Santo Domingo de Guzmán Santo Domingo de Guzmán Santo Domingo de Guzmán Santo Domingo de Guzmán Santo Domingo de Guzmán Santo Domingo de Guzmán Cuisnahuat Nahuizalco Tacuba

Sometimes I refer to Izalco and Nahuizalco as “highland dialects” and Cuisnahuat and Santo Domingo as “lowland dialects”. It is not entirely clear at this stage to which

The texts

107

group Tacuba might belong; early impressions suggest it has some characteristics of both.

The texts

108

The structure and content of the texts in NAWACOLEX On a technical level, one of the things that give Toolbox such as NAWACOLEX so much power is the idea of using the same fundamental building blocks, the same “underlying structure” so to speak, to configure a variety of types of material which appear to have different properties and yet are also governed by common rules which allow them to interact fluidly. Two such types of material (i.e. database types) are lexicons and texts. The common building blocks consist of a shared underlying structure into fields of data. In text files, a line of text (= a sentence) is stored in a \tx field, while another field precedes each \tx field and assigns it a unique identifier; these line codes are in a \ref field. For the most part, a simple page of text in NAWACOLEX consists of an alternating series of \ref and \tx fields:

Thus a page of text is made up of a repeatable sequence of two fields, a \ref and a \tx, e.g. Masin text, page 1, line 2

\ref \tx

NPT01_01.002 Inte waktuk.

Line code Line of text

The whole visible page represents a record in the database. Just as we have seen that each lexical record must start with a headword (an \lx field), each text record starts with another identifier, or page code, and this is in an \id field. In NAWACOLEX the \id fields are invisible, but it is followed by a \des (description) field which contains the visible title of the page. You might say that each page has two titles, one the machine sees (\id) and one for human readers to see (\des), e.g.

The texts

109

Masin text, page 1 (beginning)

\id \des \ref \tx \ref \tx ... ...

NPT01_01 Im Mais- und Bohnenfeld / En la milpa y en el frijolar NPT01_01.001 Ashkan ajwituk ne tal iwan tutunik. NPT01_01.002 Inte waktuk.

Page identifier (invisible title) Page description (visible title) Line code Line of text (1) Line code Line of text (2)

In NAWACOLEX a default configuration is used which hides the \ref fields from sight, and the \id field is made illegible by using a white font. As a result, when you open in NAWACOLEX the above example looks like this:

Ctrl-M, which toggles the “Hide fields” option on and off, can be used to display the \ref fields:

If you wish to see the \id field (page code), just drag the mouse pointer across it:

Of course 20th century source texts (Masin, Arauz, Campbell, Derechos, Ramirez and Lopez) are spelled in a wide variety of ways, but have been standardized in NAWACOLEX.

The texts

110

In many cases it has not been possible, for reasons of time and resources, to also provide the text in its original spelling, although this is technically possible and in some cases has been done, e.g. the Derechos text:

Here is a summary of the most salient different spelling differences in the source texts: Source

Spelling rules

Masin

g for some “k”s, u for “w”, ts for “tz”, č for “ch”, š for “sh”, χ for “j”, ņ for *ŋ+; ´ on stressed vowels. Imitates Spanish; uses hu for “w”, x for “sh”; t’ , c’ for final “t”, “k”; ^ on stressed vowels.

Arauz Campbell Derechos Genaro, Ramirez Lopez IRIN

Phonemic & morphemic; ts for “tz”, x for “sh”, h for “j”; original long vowels with :. g for some “k”s, q for “kw”, z for “tz”, x for “sh”, h for “j”. Either imitates Spanish or follows IRIN (inconsistent). q for “kw”, z for “tz”, c for “ch”, x for “sh”, h for “j”. IRIN spelling.

The texts

111

Treatment of the transcripts of interviews The IRIN transcripts were developed “natively” inside Toolbox; hence they have a Toolbox-compatible structure from the start, but there is a richer body of information present in them than in some other text files and use some special conventions to encode this. Consider first of all this passage from the begininning of IRIN interview number 2:

The two people talking here are informally identified as the “interviewer” (Paula Lopez) and the “interviewee”. Since the interviewee’s words might be thought to be the main focus of the interview, it is these that are assigned to the standard field for Nawat text, the \tx field, and the interviewer is allotted a different field, \t0v, the field for utterances of the “Nawat-speaking field worker”. Pressing Ctrl-M to unhide the codes, we can see that each sentence (whether spoken by the interviewer or the interviewee) is preceded by its own line code in a \ref field):

A little further down the page we see an example of another field type, \tc, which contains a comment about the text that was inserted by the transcriber:

The texts

112

This same Toolbox file were used when writing subtitles for editions of some of the interviews. There are two sets of subtitles, in Nawat and in Spanish, hence two different sets of fields. The distinction between different speakers is also maintained in the subtitle fields. So there are quite a few different kinds of field needed there. These fields are not displayed by NAWACOLEX unless you go into “hide fields” mode (Ctrl-M):

The texts

113

The corpus  Masin (NPT01) The “Ynes Masin corpus”, as we often call the body of Nawat texts found in Schultze’s book and in Tajtaketza Pal Ijtzalku, edited by A.R. King, is well-known enough not to need an introduction here. In any case, I would refer anyone seeking more information to the original book and/or my introduction in the Tajtaketza edition. Cf. also my brief observations in the section on the Schultze lexicon above. Fortunately many of the shortcomings of Schultze’s descriptive work on the language in his grammar and vocabulary do not apply to the text, where he is merely presenting the stories as he heard them rather than dissecting the words. But certain things still need to be taken into account to understand his form of transcription. Schultze displays little or no comprehension of modern phonology which sees a language’s sound system as consisting of basic abstract units, phonemes, which are not always pronounced identically in every context yet are the building blocks of larger linguistic forms such as words, and also ideally are the basis for a spelling system. Even in premodern times alphabets often tended roughly to imply some such analysis, which often accounts for apparent surface “mismatches” between symbol and sound. Typically alphabets are not phonetic; the idea that they are or should be is partly based on a misconception about how writing works and, indeed, how languages work. The trouble with Schultze’s effort to represent Nawat in writing is that he does not follow this principle, but gives a “phonetic” rendering of the texts. That is either a good or a bad thing depending on what one is trying to do. It is good up to a point in that he tells us how the language he heard actually sounded, at least about as well as anyone could in days before the advent of audio recordings. It is bad in the sense that the result is both unsystematic and awkward to read and understand. It is as if an artist were to paint a portrait of someone that included every temporary blemish, every slight momentary twitch or distraction, every stray hair or speck of dust or casual light effect of the instant, no matter how irrelevant to a broader statement about what the person generally looks like. If there happened to be an odd crumb on his jacket or a mosquito in the room, they will be preserved for all posterity as if they actually had anything to do with who this was. So, if a speaker sporadically mispronounces a word (and any speaker of any language may do this from time to time, even if a professional speaker; it is what Chomsky calls performance as opposed to competence), that mispronunciation will be registered in Schultze’s text. This produces an interesting document but is different behavior from what a trained writer would do when writing a cultivated language. Consequently, re-spelling Schultze’s text in modern spelling is often less straightforward than might be supposed, requires comprehension and demands an effort at interpretation. Looking at Schultze’s text and trying to put it into modern spelling is akin to listening to a scratchy recording of someone talking and trying to “figure out” and write down what they are saying. Here is a sample passage in both spellings side by side for comparison (from the beginning of chapter 12):

The texts

Nimi-túyat úme laχlamátket, sésé yeχémet gipia-túyat sé i kúneu síuat. Ne laχlamátket ginegi-túyat, mamunamiktígaņ ne in kuχkúneu; wáņ ne siuapípil inté ginegi-túyat munamiktíat. Ne laχlamátket gičíuket se yuuálu, pal ásit ne míak taχtagámet. uáņ ne siuapípil yáuit mináyat tik ne kál séyuk síuat. Ne giņgilía: “Kémaņ gičíuat séyuk yuuálu, šiuigígaņ ga nígaņ!” Uáņ ne laχlamátket yáχket giyáχuat ne séyuk síuat, gilíẋket: “Kémaņ ásit ne siuapípil, ma inté giņyaχkáuat, kalágit, »teχemet tiẋnégit mamunamiktígaņ!« “Nú naņ ginégi, ga náχa nimunamíkti, uaņ náχa inté niẋnégi!” Ne laχlamátket gilíẋket ne in kuχkúneu, ga, asunté ginégit munamiktíat, magiságan tik ní čaņ: »Intéya tiẋnégit tigídat ne kalíẋtik!«

114

Nemituyat ume lajlamatket. Sese yejemet kipiatuyat se ikunew siwat. Ne lajlamatket kinekituyat ma munamiktikan ne inkujkunew. Wan ne siwapipil inte kinekituyat munamiktiat. Ne lajlamatket kichiwket se yualu pal asit né miak tajtakamet, wan ne siwapipil yawit minayat tik ne kal seuk siwat, ne kinhilia: "Keman kichiwat seuk yualu, shiwikikan ka nikan." Wan ne lajlamatket yajket kiajwat ne seuk siwat. Kilijket, keman asit ne siwapipil, ma inte kinhajkawa kalakit: "Tejemet tiknekit ma munamiktikan." "Nunan kineki ka naja nimunamikti wan naja inte nikneki." Ne lajlamatket kilijket ne inkujkunew ka asu-nte kinekit munamiktiat, ma kisakan tik ne ichan. "Intea tiknekit tikitat né kalijtik."

 Arauz (NPT02) See my comments above on Próspero Aráuz’s text in the chapter on the lexicons; there is nothing much to add here. There follows a sample passage in the source text’s spelling and that of NAWACOLEX for comparison. It comes from p. 57: Quen galanchichin nêmi ne cojtan ihuan ni cuajcuâuyu tâquen xuxûhuic’ ishuâyuc’ îhuan pujpuputûca ca ixuchîu îhuan mîac’ pujpuxa! Ni ajat’ ijtic’ yaja nêmi îhuan yaja tzitzinâca ajachichi îhuan mutatalûa ât’, quixinîat’ tunâlco ne patâhuac’ tujtûlin, mijmil îhuan cêqui mutucâtuc’ tey titacûat’ pal ini îhuan cecêyuc’ têchan.

Ken kalanchichin nemi ne kojtan, iwan ne kwajkwawyu taken shushuik iswayu (iswayuk), iwan pujpuputuka ka ishuchiu iwan miak pujpusha! Ne ajat ijtik yaja nemi iwan yaja tzitzinaka ajachichin iwan mutatalua at, kishiniat tunalku ne patawak tujtulin, mijmil iwan seki mutukatuk tey titakwat pal ini iwan seseuk techan.

 Campbell (NPT03) Lyle Campbell’s book (see also the lexicons chapter) contains a few stories both from Cuisnahuat and Santo Domingo. He provides a morphemic breakdown indicated by hyphens. An example is shown here (from the story “El Nanahuatzin”, Cuisnahuat): A:xa:n katka se: ta:ka-t mu-na:miktih. Tesu ki-mati katka ka ne i-siwa:-w se: bru:hah. Ka tah-tayuwa kuchi nemi. Ne i-siwa:-w ki:sa pa:xa:lua. Mu-kech-kupi:na ki:sa pa:xa:lua. Naka ne i-kwerpoh, se: maya ne i-tsuntekun. Yaha ki-tahtan se: konse:hoh wan ki-maka-ke-t, ki-maka-t se: konse:hoh. K-ilwih-ke-t ma: ki-ma:walti chi:l. Pwes ki-ma:waltih ne se:yuk. Wi:ts ka madrugada ne i-tsuntekun; te:ya su

Ashan katka se takat munamiktij. Tesu kimati katka ka ne isiwaw se brujaj. Ka tajtayua kuchi nemi. Ne isiwaw kisa pashalua. Mukechkupina, kisa pashalua. Naka ne icuerpoj, semaya ne itzuntekun. Yaja kitajtan se consejoj wan kimakaket, kimakat se consejoj. Kilwijket ma kimawalti chil. Pues kimawaltij ne seyuk. Witz ka madrugada ne itzuntekun.

The texts weli-k mu-sa:lua.

 Other texts See comments above.

115 Tea su welik musalua.

APPENDIX Toolbox: Shortcut keys The following shortcut keys are mentioned in this tutorial or may be found useful. See the menus or Toolbox documentation for other shortcut keys.

 Ctrl-L opens the dialog to perform a concordance  Ctrl-M toggles hide field  Ctrl-Shift-P moves to the first record  Ctrl-Z serves to undo some actions  Alt-L opens the dialog to create a wordlist  Alt-N moves to the next record  Alt-P moves to the previous record  Alt-R toggles Browse mode  Shift-F5 reshapes a field

Further notes

117

Toolbox: Advanced users NAWACOLEX is configured in such a manner that it can only be read but not written to by the end-user. This has the advantage that you can do what you like with it but you can’t really break it, because the next time you open it (after having closed it) it will be back to the way it was from the start. This is not the default setting that Toolbox comes with. For reasons that need not concern us, in Toolbox documentation this read-only configuration is referred to “exercise mode”. NAWACOLEX can be taken out of “exercise mode” at any time. In that case, everything that you do to your copy of it will be remembered and saved. For example:  if you close a window, it will remain closed (and unavailable, so NAWACOLEX may not be able to work properly);  if you add, change or delete data, it will remain added, changed or deleted;  if you change any configuration of the data base, it will save the configuration change;  if you alter the interface (e.g. by moving windows around), it will not go back to the original format on its own. Read-only is a good way to go for users who are not fully conversant with Toolbox’s workings and who just want to access a body of data and not worry too much about doing things wrong: each time you start a new session your old session is forgotten and you are back to zero again. To toggle between exercise mode (“read-only”) and regular (modifiable) mode, type Ctrl-Alt-Shift-T. To override the exercise mode and save all current settings and content as the new read-only version while remaining in exercise mode, type Ctrl-AltShift-S. The changes you save will be permanent and cannot be reverted automatically. (Of course, if you really get into trouble you can always download NAWACOLEX again and start from scratch!)

BIBLIOGRAPHY Aráuz, Próspero 1960. El pipil de la región de los Itzalcos. San Salvador: Ministerio de Cultura, Departamento Editorial. Campbell, Lyle 1985. The Pipil language of El Salvador. Berlin: Mouton. Hernández, Werner [2012]. [Nawat-Spanish vocabulary.] Unpublished manuscript. IRIN - Te Miki Tay Tupal. Language documentation project. [Recorded and transcribed interviews between Nawat speakers.] King, Alan R. 2004. Léxico básico náhuat. Unpublished manuscript. King, Alan R. 2013. BibLex (lexical database of the Nawat translation of the New Testament). Ne Bibliaj Tik Nawat. López, Paula. Stories and songs (undated). Ramírez Vásquez, Genaro [2002]. [Short Nawat vocabulary]. Unpublished manuscript. Ramírez Vásquez, Genaro 2004. Naja Ni Genaro. IRIN. Ramírez Vásquez, Genaro. Miscellaneous manuscripts (untitled and undated). Schultze Jena, Leonhard 1935. Indiana II. Mythen in der Muttersprache der Pipil von Izalco in El Salvador. Jena: Gustav Fischer. Todd, Juan G. 1953. Notas del náhuat de Nahuizalco. San Salvador: Editorial “Nosotros”. United Nations 1997. “Munextia muchi ipal ne tehtechan tay tupal. Declaración universal de derechos humanos.”