PYTHIA: a deep neural network model for the automatic restoration of ancient Greek inscriptions

Thea Sommerschield (Oxford)

Digital Classicist London seminar 2020

Friday June 5th at 17:30, online from the Institute of Classical Studies.

Livecast at Digital Classicist London YouTube channel.

This presentation introduces the first deep neural network model for the automatic restoration of ancient Greek inscriptions: PYTHIA (Assael, Sommerschield, Prag 2019).

Ancient History relies on disciplines such as Epigraphy, the study of ancient inscribed texts, for evidence of the recorded past. However, these texts, "inscriptions", are often damaged over the centuries, and illegible parts of the text must be restored by epigraphists. My presentation will introduce PYTHIA, the first ancient text restoration model that recovers missing characters from a damaged text input using deep neural networks. Its architecture is carefully designed to handle long-term context information, and deal efficiently with missing or corrupted character and word representations. PYTHIA was trained on PHI, the largest digital corpus of ancient Greek inscriptions. A non-trivial pipeline was written to convert PHI into machine actionable text, and the resulting dataset is called PHI-ML.

On PHI-ML, PYTHIA's predictions achieve a 30.1% character error rate, compared to the 57.3% of human epigraphists. Moreover, in 73.5% of cases the ground-truth sequence was among the Top-20 hypotheses of PYTHIA, which effectively demonstrates the impact of this assistive method on the field of digital epigraphy, and sets the state-of-the-art in ancient text restoration.

I will first introduce key concepts in Epigraphy and Machine Learning; I will then expand on the text pre-processing pipeline converting PHI into machine-actionable text (PHI-ML) and explain in detail the model’s architecture. I will evaluate the model’s performance on a selection of damaged inscriptions and show PYTHIA ‘in action’, also visualising the model’s attention weights. Finally, I will explain how to freely access the model and dataset online, and give a brief demo of how to use Pythia for personal research. I will conclude by discussing three other recent digital epigraphy research projects developed by UBISOFT, MIT and CDLI.

ALL WELCOME