Of People, Places and References: Extracting information from Classics publications
Matteo Romanello (Deutsches Archäologisches Institut / École polytechnique fédérale de Lausanne)
Digital Classicist London & Institute of Classical Studies seminar 2016
Friday June 10th at 16:30, in room 234, Senate House, Malet Street, London WC1E 7HU
This seminar aims to give a gentle introduction to Named Entity Recognition (NER), one of the crucial steps towards the automatic extraction of information from unstructured texts. Since the notion of named entity is largely domain-dependant, this seminar will focus on capturing named entities of interest to classicists from publications in this field (e.g. ancient people, places, texts etc.). This seminar is divided into two parts: in the first part I introduce the methods and tools to perform and evaluate NER; then in the second part I present an example of domain-specific NER, namely a system to extract canonical references.
Recommended readings:
Wikipedia, s.v. Named-entity recognition
S. Bird, E. Klein and E. Loper, Natural Language Processing with Python, O’Reilly, 2009. Available at: http://www.nltk.org/book/ (only ch. 7, sections 1 and 5)
Romanello, Matteo (2016). 'Extracting Citation Networks from Publications in Classics.' (pre-print), http://dx.doi.org/10.5281/zenodo.46328 (only pp. 1-7)
Livecast at Digital Classicist London YouTube channel.
NB: this session will be followed by a launch event for the collected volume Digital Classics Outside the Echo-Chamber at 18:00 on the second floor foyer.
ALL WELCOME