Current Issues in Language Documentation

0 downloads 0 Views 2MB Size Report
Current Issues in Language. Documentation. Prof Peter K. Austin. Endangered Languages Academic. Programme. Department of Linguistics, SOAS.
Tokyo University of Foreign Studies, February 2010

Current Issues in Language Documentation Prof Peter K. Austin Endangered Languages Academic Programme Department of Linguistics, SOAS 1

Outline • What is language documentation? • How does it differ from language description (and from linguistic theory)? • Reshaping 'the science of language' • Some challenges • Conclusions

2

Language documentation • “concerned with the methods, tools, and theoretical underpinnings for compiling a representative and lasting multipurpose record of a natural language or one of its varieties” (Himmelmann 1998) • has developed over the last decade in large part in response to the urgent need to make an enduring record of the world’s many endangered languages and to support speakers of these languages in their desire to maintain them, fuelled also by developments in information and communication technologies • essentially concerned with roles of language speakers and their rights and needs 3

What documentary linguistics is not

4



it's not about collecting stuff to preserve it without analysing it



it's not = description + technology



it's not necessarily about endangered languages per se



it's not a fad

The level of interest is very high

Graduate student interest •

• •

5

62 students graduated from SOAS MA in Language Documentation and Description 200409 – currently 17 are enrolled 7 graduates in PhD in Field Linguistics – 12 currently enrolled other documentation programmes, eg. UTAustin have similar experience

Interest in training • • •

6

3L Summer School 2009 – 100 attendees 3L Summer School 2008 – 80 attendees SOAS fieldwork seminars – 70 attendees

InField 2008 – 75 attendees

7

And more good news ... Research funding £££ $$$ ¥¥¥ •

• • • • 8

ELDP has so far funded 195 documentation research projects on endangered languages worth GBP 7.25 million (¥ 1,049,157,855) Volkswagen DoBeS has funded 60 projects EUR 30 million NSF-NEH DEL 60 projects $US 10 million ESF EuroBABEL project EUR 8 million and ELF, FEL, GfBS, Unesco ...

DoBeS projects

9

ELDP funding Projects 2003-2007

10

Books and journals • • • • • 11



Gippert et al 2006 Essentials of Language Documentation. Mouton Tsunoda 2006 Language endangerment and language revitalization: an introduction Language Documentation and Description – 6 issues (1,500 copies sold), 2 in prep Language Documentation and Conservation – 6 issues (on-line only) Cambridge Handbook of Endangered Languages Routledge Essential Readings

back to Language Documentation

12

Main features (Himmelmann 2006:15) • Focus on primary data – collection and analysis of an array of primary language data to be made available for a wide range of users; • Explicit concern for accountability – access to primary data and representations of it makes evaluation of linguistic analyses possible and expected; • Concern for long-term storage and preservation of primary data – includes a focus on archiving in order to ensure that documentary materials are made available to potential users now and into the distant future; 13

Main features (cont.) • Diversity – of contexts, languages, cultures, communities, individuals, projects • Work in interdisciplinary teams – documentation requires input and expertise from a range of disciplines and is not restricted to mainstream (“core”) linguistics alone • Close cooperation with and direct involvement of the speech community – active and collaborative work with community members both as producers of language materials and as co-researchers 14

The documentation record • core of a language documentation is a corpus of audio and/or video materials with transcription, annotation, translation into a language of wider communication, and relevant metadata on context and use of the materials • the corpus will ideally be large, cover a diverse range of genres and contexts, be expandable, opportunistic, portable, transparent, ethical and preservable • lexico-grammatical analysis (description) and theory construction is contingent on and emergent from the documentation corpus (Woodbury 2003, 2010) 15

Components of documentation • Recording – of media and text (including metadata) in context • Transfer – to data management environment • Adding value – transcription, translation, annotation, notation and linking of metadata • Archiving – creating archival objects, assigning access and usage rights • Mobilisation – creation, publication and distribution of outputs

16

An example – Stuart McGill • •



4 year PhD project at SOAS documentation of Cicipu (Niger-Congo, northwest Nigeria) in collaboration with native speaker researchers outcomes: – – – – – –

17

a corpus of texts (video, ELAN, Toolbox) 2,000 item lexicon archive (956 files, 50Gbytes) overview grammar (134 pages) analysis of agreement (158 pages) website, cassette tapes, books, orthography proposal and workshop

Documentation and description

18

Documentation and description • language documentation: systematic recording, transcription, translation and analysis of the broadest possible variety of spoken (and written) language samples collected within their appropriate social and cultural context • language description: grammar, dictionary, text collection, typically written for linguists Ref: Himmelmann 1998, Woodbury 2003, 2010 19

Documentation and description •



20

documentation projects must rely on application of theoretical and descriptive linguistic techniques, to ensure that they are usable (i.e. have accessible entry points via transcription, translation and annotation) as well as to ensure that they are comprehensive only through linguistic analysis can we discover that some crucial speech genre, lexical form, grammatical paradigm or sentence construction is missing or underrepresented in the documentary record

Documentation and description •

21

without good analysis, recorded audio and video materials do not serve as data for any community of potential users. Similarly, linguistic description without documentary support risks being sterile, opaque and untestable (not to mention non-preservable for future generations and useless for language support)

Workflow Description something happened



applied knowledge, made decisions

something inscribed NOT OF INTEREST

cleaned up, selected, analysed

representations, lists, summaries, analyses presented, published FOCUS OF INTEREST

Documentation recapitulates something happened

 22

applied knowledge, techniques

recording made decisions, applied linguistic knowledge FOCUS OF INTEREST

representations, eg transcription, annotation archived, mobilised FOCUS OF INTEREST

Language documentation gives linguistics an opportunity to reassert itself as 'the science of human language'

23

Linguistics – the science of language





24

documentation requires a scientific approach to information capture, paying proper attention to environmental factors including spatial layouts, equipment choice etc., requiring knowledge and skills more often found in music or film rather than descriptive/theoretical linguistics documentation requires a scientific approach to data structuring, processing and analysis, paying attention to, eg. data modeling and knowledge representation. requiring skills more often found in computer science rather than descriptive/theoretical linguistics

• documentation requires a scientific approach to data archiving and preservation, with proper attention to metadata, data formats, corpus structure, workflows, and to protocols (access and usage rights) requiring knowledge and skills more often found in archiving theory and practice rather than descriptive/theoretical linguistics • documentation demands a scientific approach to mobilisation with proper attention to pedagogy, applied linguistics, human-computer interaction (interface design etc.) 25

Challenges

26



Interdisciplinarity



Legacy data



Meta-documentation



Recruitment, training and sustainability

Interdisciplinarity • multidisciplinary perspective in language documentation could potentially draw in researchers, theories and methods from a wide range of areas, including anthropology, musicology, (oral) history, psychology, ecology, pedagogy, applied linguistics, computer science etc • true interdisciplinary research, is difficult to achieve, both because of theoretically different orientations, and practical differences in approach that can make communication and understanding complex and difficult 27

• mainstream linguistics has tended to turn away from other disciplines and to emphasise its ‘independence’ by concentrating on theoretical concerns that are of internal interest primarily to linguists alone (Libermann 2007) • language documentation opens new doors to interdisciplinary collaboration but we need to work out how to achieve it

28

Legacy data • Language documentation theory has assumed that data is collected now and has not paid attention to already existing materials (digital or analogue) • Legacy data raises many issues if we wish to include it in our corpus or to treat it like other modern digital data • There are practical, technical, ethical, and political issues that legacy data raise, and many questions which may be difficult to answer 29

Some problems

30

• Should legacy materials (media, text) be ‘cleaned up’ for modern use? Important to document goals and processes • Modern documentation assumes ‘informed consent’ and rights to control access respecting individual and community wishes and sensitivities, but consent1980 ≠ consent2010 • And access1980 ≠ access2010 • And sensitivities1980 ≠ sensitivities2010 • And community1980 ≠ community2010

Meta-documentation • We can’t easily anticipate technological changes and ethical norms of the future • We can expect that they will change • Creating information about the research context (documenting the documentation) could facilitate reassessments of rights and responsibilities in the future • Critical information may be impossible to obtain after the death of the researcher 31

What to metadocument?

32

• The stakeholders that were involved and how (their roles in the project) • The attitudes of language consultants • The methodology of the researcher (contact, consent, compensation, culture) • The biography (including background knowledge and experience) of the researcher and main consultants • Any agreements entered into, whether formal or informal (MOU, payment, promises and expectations) • …

A (not too serious) metaphor • Researchers and projects have different working arrangements, but perhaps we can develop a typology of these • Using a metaphor from human land use practices – Hunting & gathering – Slash & burn swidden – Pastoral nomadism – Sedentary intensive agriculture – Plantation – Sustainable land use 33

34

35

Hunting and gathering

36

Slash and burn swidden

37

remove major obstacles, farm intensively for 2-3 years and then move to another site

Sedentary intensive agriculture

38

ranges from feudal to communal, with employment of local serfs and artisans in temporary or specialist roles (and necessary application of fertiliser and pesticides, or crop rotation)

Plantation

39

train 3rd world locals to grow consumable products in the correct form and extract them to refine and add expensive value in 1st world

Sustainable land use

ecology-driven wholistic approach, including reforestation and recuperation of damaged land

40

41

Sustainability





• 42

we need to work out how to recruit new contributors to the discipline, how to train them, and how to sustain them through fulfilling career paths we understand sustainability of archived data but how do we sustain projects and relationships beyond the typical 3-5 year academic life cycle? how can documentation contribute to sustaining endangered languages and the communities who want to maintain and develop them?

More metadocumentation • Meta-documentation of granting projects • Meta-documentation of archiving practices • Meta-documentation of meta-data (cf. ‘best practices model’ vs. Nathan 2009 “types of metadata vary by project, language, community, consultants,...”) • Meta-documentation of project outcomes against project (application) goals • …

43

Example: Meta-documentation of grants ELDP grants by continent

80 70 60 50 Number of projects 40 30 20 10

Target Host

44

Asia

AusPac

Africa

CSAmerica

Continents

Namerica

Europe

MidEast

0 Host Target Target vs Host

Where does the money go? Value of ELDP grants

£3,000,000 £2,500,000 £2,000,000 Host versus target £1,500,000 £1,000,000 Target value

£500,000

45

Target value Namerica

Europe

CSAmerica

Middle East

Continents

AustPac

Asia

Host value Africa

£0

Grant value

Conclusion

language documentation is an exciting development in terms of research goals, methods and outcomes that offers the potential to reshape the scientific and humanistic basis of linguistics now and into the future but many significant issues remain to be developed and discussed

46

Thank you

47