A digital grammar of Yawarana (Cariban): A corpus-linguistic investigation of an endangered indigenous language
The Cariban language family is one of the largest genealogical units of indigenous languages of South America. While recent years have seen increased documentary and descriptive efforts of these languages, many remain underdescribed. One such language is Yawarana, spoken along the Venezuelan Manapiare and Ventuari rivers and highly endangered, with about 30 speakers remaining. Only little is known about the grammar of Yawarana, which has undergone considerable language change in comparison to related languages. Its genealogical position with the language family is unclear, and language contact with genealogically unrelated languages has been suggested. The unknown prehistory of Yawarana speakers, its position in the family tree, its innovative nature, and its imminent extinction make the study of Yawarana a high priority. A recent documentary project has compiled a considerable corpus of naturalistic and elicited Yawarana speech. The present project aims at contributing to our knowledge about Yawarana grammatical structures by conducting an in-depth grammatical study based on this corpus. For this purpose, a novel tool for digital grammaticography will be used, allowing for the tight integration of corpus data into the grammatical description, an approach which so far has seen very little use in the digital humanities. Necessary grammatical information not deducible from the corpus will be identified and gaps in the grammatical description will be filled in collaboration with native speakers. Apart from constituting an important source of information on Yawarana grammar for researchers, the description will also serve as the basis for grammatically informed pedagogical materials for non- or semi-speakers in the Yawarana community, where language revitalization efforts are ongoing. Further, Yawarana shows optional ergative marking, a little understood phenomenon found in languages across the world. Also, attested radical innovations in its verbal morphosyntax make it likely that a) it has undergone changes in word order different from other Cariban languages, and b) it has developed grammatical distinctions between main and subordinate clauses not found in other Cariban languages. These three phenomena will be investigated in detail using a quantitative corpus linguistics approach. This will lead to novel insights about the phenomenon of optional ergativity, as well as about diachronic pathways of word order and the main/subordinate clause distinction in Cariban languages. The impact of the envisaged project is manifold. Cariban studies will benefit from a description of Yawarana, whose structures are so far not widely known to the field. Further, non- and semi-speakers within the Yawarana community will be able to profit from more grammatically informed educational materials. The grammar will also have an impact on the wider discipline of linguistics, as every additional description of an unknown language furthers our understanding of what human languages are capable of, and serves as input for future cross-linguistic studies. The inclusion of Yawarana in such studies will be facilitated, due to the accessible nature of the grammatical description and the corpus. The project will also contribute to the understudied phenomenon of optional ergativity, and provide a basis for similar studies in other languages. A first version of the digital grammar framework will be made available for other researchers to use. If adopted, this will make annotated corpora resulting from linguistic fieldwork more accessible, both to researchers, readers, and members of the speech communities themselves. It will enable tight connections between grammatical descriptions and the underlying corpora, and provide a chance to fully harness modern technological possibilities in the ongoing efforts to describe the world's many little-known minority languages.