In:
Journal of Jewish Languages, Brill, Vol. 10, No. 1 ( 2022-06-20), p. 24-53
Kurzfassung:
The Tagged Algerian Judeo-Arabic ( TAJA ) corpus is the first linguistically annotated corpus of any Judeo-Arabic dialect regardless of geography and period. The corpus is a genre-diverse collection of written Modern Algerian Judeo-Arabic texts, encompassing translations of the Bible and of liturgical texts, commentaries and original Judeo-Arabic books and journals. The TAJA corpus was manually annotated with parts-of-speech ( POS ) tags and detailed morphology tags. The goal of the new corpus is twofold. First, it preserves this endangered Judeo-Arabic language, expanding on previous fieldwork and going beyond the study of individual written texts. The corpus has already enabled us to make strides towards a grammar of written Algerian Judeo-Arabic. Second, this tagged corpus serves as a foundation for the development of Judeo-Arabic-specific Natural Language Processing ( NLP ) tools, which allow automatic POS tagging and morphological annotation of large collections of yet untapped texts in Algerian Judeo-Arabic and other Judeo-Arabic varieties.
Materialart:
Online-Ressource
ISSN:
2213-4387
,
2213-4638
DOI:
10.1163/22134638-bja10020
Sprache:
Unbekannt
Verlag:
Brill
Publikationsdatum:
2022
SSG:
7,7