A Transformer-based Parser for Syriac Morphology

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Naaijer, Martijn
Constantijn Sikkel
Mathias Coeckelbergs
Jisk Attema
Willem Van Peursen

In this project we train a Transformer-based
model from scratch, with the goal of parsing the
morphology of Ancient Syriac texts as
accurately as possible. Syriac is a low-resource
language, only a relatively small training set
was available. Therefore, the training set was
expanded by adding Biblical Hebrew data to it.
Five different experiments were done: the
model was trained on Syriac data only, it was
trained with mixed Syriac and (un)vocalized
Hebrew data, and it was trained first on
(un)vocalized Hebrew data and then trained
further on Syriac data. The models trained on
Hebrew and Syriac data consistently
outperform the models trained on Syriac data
only. This shows that the differences between
Syriac and Hebrew are small enough that it is
worth adding Hebrew data to train the model
for parsing Syriac morphology. Training
models with data from multiple languages is an
important trend in NLP, we show that this
works well for relatively small datasets of
Syriac and Hebrew.

Originalsprog	Engelsk
Titel	Proceedings of the Ancient Language Processing Workshop associated with the 14th International Conference on Recent Advances in Natural Language Processing RANLP 2023
Antal sider	7
Udgivelsessted	Varna, Bulgaria
Publikationsdato	2023
Sider	23-29
ISBN (Trykt)	978-954-452-087-8
Status	Udgivet - 2023

Det Teologiske Fakultet

A Transformer-based Parser for Syriac Morphology

Links