PRIVACY BLACK & WHITE
PRIVACY BLACK & WHITE was the first systematic data-driven investigation of how privacy—as a privilege contested among Europeans and Africans—undergirded property practices, slavery, and racism in the Caribbean-European colonial nexus (c. 1600-1850). Our project employed a collaborative intelligence approach, combining human and machine intelligence, to investigate how privacy became racialized across colonies and empires.
The history of the European-Caribbean colonial nexus, comprising more than six European colonial powers and a high degree of linguistic complexity constitutes a unique challenge for historians and computational scientists. NLP tools are not currently equipped to deal with this level of linguistic complexity, while historians have yet to handle the many languages, geographies, and historiographies involved in the making of racial slavery. To resolve this challenge, we combined human and machine intelligence to investigate racialized privacy. The scale and complexity of the archival corpus calls for the development of NLP tools that are capable of representing racialized language along with gender, sentiment, geographical, and temporal information, in the many early modern languages involved.
PRIVACY BLACK & WHITE resulted in new methodological and scientific developments for both NLP and historical research. Historical data still represents a challenge to NLP tools due to it often being fragmentary or difficult to make the sources machine-readable at sufficient quality. As such, one of the main results of the project was Nadav Borenstein’s model PhD - Pixel-Based Language Modeling of Historical Documents, which aims to address this issue using an image-based language model that bypasses the noise usually introduced by OCR.
Measuring intersectional bias is a crucial task for language models, but research in the field is often focused on contemporary language biases. PRIVACY BLACK & WHITE expanded this temporal scope significantly by measuring intersectional bias in Caribbean newspapers from the eighteenth and nineteenth centuries. NLP tools enabled us to analyze how linguistic biases developed over time and reflected both global and local historical processes.
NLP models are usually trained on modern texts, making the results for historical data less reliable. The challenges increase for a multilingual text corpus, which is often the case in the multicultural environment of the colonial Caribbean. PRIVACY BLACK & WHITE explored different solutions to these issues, which highlighted the need for collaboration between historians and computer scientists to adapt models with annotations done with historical expertise.
Structuring large amounts of historical data can facilitate historical research immensely as long as the attributes are clearly stipulated. PRIVACY BLACK & WHITE developed a model for multilingual event extraction trained specifically for fugitive ads and jail lists and evaluated it in its capacity to extract correctly the attributes relevant for historical analysis. The model performed well, understanding some surprisingly nuanced attributes across languages. However, it also highlighted the need for collaboration between NLP and History, as without supervision by history experts, the current model enhances the chances of a continuous reproduction of the perspective of colonial enslavers and risks replicating key elements of slavery such as considering the enslaved only as objects in the processing of data, denying them their personhood.
The benefits and challenges of the collaboration between computer science and history encountered during the run of PRIVACY BLACK & WHITE are a great asset for future collaborations and to train future experts. As such, the project also developed the Advanced School for Computational History, a three-day event at the Universidade Federal de Santa Maria (Brazil) with the aim of training future historians from the Global South on how to approach NLP tools and computational history methods.
Nadav Borenstein, Natália Da Silva Perez, and Isabelle Augenstein. “Multilingual Event Extraction from Historical Newspaper Adverts.” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 10304–25. Toronto, Canada: Association for Computational Linguistics, 2023
Nadav Borenstein, Karolina Stanczak, Thea Rolskov, Natacha Klein Käfer, Natália Da Silva Perez, and Isabelle Augenstein. “Measuring Intersectional Biases in Historical Documents.” In Findings of the Association for Computational Linguistics: ACL 2023, 2711–30. Toronto, Canada: Association for Computational Linguistics, 2023.
Nadav Borenstein, Phillip Rust, Desmond Elliott, and Isabelle Augenstein. “PHD: Pixel-Based Language Modeling of Historical Documents.” In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, edited by Houda Bouamor, Juan Pino, and Kalika Bali, 87–107. Singapore: Association for Computational Linguistics, 2023.
Sanne Maekelberg and Natália da Silva Perez (eds). “Digital Methodologies for Research on Early Modern Privacy (1500-1800)”. Special Issue for Current Research in Digital History (accepted/in print).
Natacha Klein Käfer, “Private Birth, Public Authority: Topic changes from the sixteenth to the eighteenth century in midwifery manuals”. Current Research in Digital History (accepted/in print).
Natacha Klein Käfer, Emma Klakk, and Mette Birkedal Bruun. “Ethics of Interdisciplinary Research across Multiple Ranges of Proximity”. Issues in Interdisciplinary Studies (accepted/resubmitted after peer-review)
Natacha Klein Käfer, Heather Freund, Felicia J. Fricke, and Gunvor Simonsen. “Digital Research Tools, Newspaper Texts, and Enslaved Fugitives in the Atlantic World”. archipelagos: a journal of Caribbean digital praxis (under peer review).
MEMBERS: Gunvor Simonsen (SAXO-PI), Mette Birkedal Bruun (PRIVACY-PI), Isabelle Augenstein (DIKU-PI), Natália da Silva Perez (co-creator), Natacha Klein Käfer (research coordinator), Nadav Borenstein (DIKU -PhD), Felicia Fricke (SAXO - Postdoc), Heather Freund (SAXO - Postdoc)