Digital and Computational Paleography

Peter A. Stokes gave us a workshop on Digital and Computational Paleography. While the title suggests a narrow focus, the workshop covered a broader range of topics, including material manuscript studies and the practice of photographically digitising manuscripts, with a strong focus on machine learning and AI.

A central theme was the question of what digital and computational paleography actually is, and what broad requirements such methods have in common: clear input data, a defined question that expects a defined shape of answer (e.g. image in, transcription out), sample or reference or training data, and a way of measuring the quality of the result.

Much of the discussion revolved around the different things ML can do and how they apply to manuscripts: classification, object detection and labelling, regression (for instance for dating), clustering, and data generation. Concrete applications came up repeatedly, such as layout analysis and detection, Handwritten Text Recognition (tying back to the earlier Transkribus workshop), quire and binding analysis, working with multi-spectral images, coping with insufficient image quality, or subtracting writing from a page to make a palimpsest visible.

A particularly intriguing thread was the use of generative AI to create synthetic manuscript data. This can help fill in missing parts of damaged manuscripts to illustrate plausibilities based on how the proposed text fits into the existing gap, or to generate reference data for training and testing other models.