Of People, Places and References: Extracting information from Classics publications

Matteo Romanello (Deutsches Archäologisches Institut / École polytechnique fédérale de Lausanne)

Digital Classicist London & Institute of Classical Studies seminar 2016

Friday June 10th at 16:30, in room 234, Senate House, Malet Street, London WC1E 7HU

This seminar aims to give a gentle introduction to Named Entity Recognition (NER), one of the crucial steps towards the automatic extraction of information from unstructured texts. Since the notion of named entity is largely domain-dependant, this seminar will focus on capturing named entities of interest to classicists from publications in this field (e.g. ancient people, places, texts etc.). This seminar is divided into two parts: in the first part I introduce the methods and tools to perform and evaluate NER; then in the second part I present an example of domain-specific NER, namely a system to extract canonical references.

Recommended readings:

Wikipedia, s.v. Named-entity recognition

S. Bird, E. Klein and E. Loper, Natural Language Processing with Python, O’Reilly, 2009. Available at: http://www.nltk.org/book/ (only ch. 7, sections 1 and 5)

Romanello, Matteo (2016). 'Extracting Citation Networks from Publications in Classics.' (pre-print), http://dx.doi.org/10.5281/zenodo.46328 (only pp. 1-7)

Livecast at Digital Classicist London YouTube channel.

NB: this session will be followed by a launch event for the collected volume Digital Classics Outside the Echo-Chamber at 18:00 on the second floor foyer.