Mine the GAP: Finding ancient places in the Google Books corpus

Elton Barker (Open University) & Leif Isaksen (Southampton)

Institute of Classical Studies Digital Seminar 2011

Friday July 15th at 16:30, in Court Room, Senate House, Malet Street, London WC1E 7HU

Google has so far digitized over 12 million books in over 300 languages, most of which were previously available only in major libraries. The amount of data now available is enormous, which is very exciting but quite bewildering.

Google Ancient Places (GAP), a Google Digital Humanities Award recipient, is mining the Google Books corpus for classical material that has a geographic and historical basis. Traditionally much antiquarian literature has been limited to scholars at prestige institutions: facilitating access to large text repositories like Google Books will help open up disciplines such as History, Classics and Archaeology to anybody with an interest in the subject. Furthermore, references to ancient literature are often brief or fragmentary; aggregating short extracts can be of great value, and information on locating the full text is helpful for more traditional scholars.

Current services are extremely powerful in their extent but have a high rate of false positive and negative matches due to the problems of toponymic homonyms and synonyms (different places that share names, and single places with multiple names). We believe that leveraging services such as GeoNames and Pleiades, along with metadata such as the location of other places in the text, should reduce such inaccuracy. By identifying spatial clustering at chapter, text and corpus scales we will be able to significantly reduce misidentifications in a fully-automated process.

While the proposed web-service will support many applications, this paper will explore the use of GAP in two specific research domains, archaeology (Open Context) and Classics (HESTIA). It will show how GAP can be a utility both for the scholar whose research has a historical or geographical basis, and for the tourist, for instance, wanting to download information on an ancient location to their smartphone―a case of literally putting knowledge into people’s hands.

ALL WELCOME

The seminar will be followed by wine and refreshments.

Audio recording of seminar (MP3)

Presentation (PDF)