PhD scholarship in Natural Language Processing for the Digital Humanities

The Department of Nordic Studies and Linguistics (NorS), Faculty of Humanities, University of Copenhagen (UCPH), Denmark, invites applications for a 3-year PhD scholarship in Natural Language Processing for the Digital Humanities to be filled by 1 May 2018 or as soon as possible thereafter.

Application deadline: 1 March 2018 

The successful candidate will work for the cross-departmental and interdisciplinary research project Skrift og tekst i tid og rum (Script and text in time and space) funded by the Velux Foundations (see https://nfi.ku.dk/forskning/forskningsprojekter/skrift-og-tekst-i-tid-og-rum/). The candidate will be attached to the Centre for Language Technology (CST), and work in close collaboration with researchers from the Arnamagnæan Institute (AMI).

CST conducts research in different areas of interest for language technology, such as natural language processing, language technology resources and the infrastructure around them, machine learning, multimodal communication and language technology for the Digital Humanities (see https://cst.ku.dk/english/forskning/).
The Centre has a strong international profile, and considerable experience managing international research projects. Together with the Department of Computer Science of the University of Copenhagen, it offers an international MSc programme in IT and Cognition. AMI conducts research into early Scandinavian language and literature through the study of its material record, in particular the manuscripts of the Arnamagnan Collection. AMI is a leading centre for the study of early Scandinavian manuscripts (see https://nors.ku.dk/english/research/arnamagnaean/) and is strong in research-based teaching at the postgraduate level including an international summer school in Scandinavian Manuscript Studies.

The project

Skrift og tekst i tid og rum (Script and text in time and space) positions itself in the area of Digital Humanities.
The goal of the project is to study medieval texts produced in Denmark in Danish and Latin from the point of view of handwriting development, as well as dating and localization of the manuscripts. The project is cross disciplinary and combines qualitative and quantitative approaches from philology, history, and language technology. The project will:

  • provide dynamic and interactive digital editions of diplomas, that is, historical documents with judicial contents
  • perform systematic research into the script and language of the diplomas
  • survey the enterprise of diplomatics in a convent in medieval Denmark and the persons and places involved 
  • develop an open source system for research, publication, and dissemination of handwritten texts in digital and printed format 


Job content

The successful candidate will engage in cutting-edge research in the area of Digital Humanities. In particular, the candidate will engage in the application of machine learning to the task of automatic dating and localization of medieval texts from Denmark, both in Danish and Latin, using existing linguistic annotations as well as visual features. The candidate will work in close collaboration not only with computational linguists from CST, but also with experts in manuscript studies and philology (including digital philology).

We expect candidates to hold a Master’s degree in Computer Science, Computational Linguistics, or similar and to have programming experience, as well as a strong background in machine learning applied to the area of Natural Language Processing. Knowledge of one or more of the following areas will be considered an asset:

  • Digital Humanities
  • Previous work with texts in older languages (especially Nordic languages and Latin)
  • Digital image processing 


Furthermore, emphasis will be placed on the following:

  • Quality and relevance of project description (see below)
  • Scientific publications, if any 


For further information about the position, please contact Senior Researcher Patrizia Paggio (paggio@hum.ku.dk

To see the full version of the call please follow this link