De-engineering the Semantic Web: Linking Archaeological Data
Leif Isaksen (Southampton)
Digital Classicist/ICS Work in Progress Seminar, Summer 2009
Friday 24th July at 16:30, in room STB3/6, Senate House, Malet Street, London WC1E 7HU
The idea of the Semantic Web has held great promise but seen little uptake by archaeologists due to its perceived complexity. Fortunately, the concept of Linked Data  - information structured using a variety of public schemas and data sources - is starting to change this perception. However, successful integration of legacy datasets requires the separation of the instances, terminologies and (frequently implicit) ontologies that constitute them so that each can be dealt with appropriately. Furthermore, it is imperative that such mappings between local and canonical terms be undertaken by those who best understand the data, ie. the curators themselves. This paper will discuss recent doctoral research seeking to provide practical solutions to this process and give some early examples of its potential benefit to archaeology.
‘Roman Ports in the Western Mediterranean’ , a project undertaken by the University of Southampton and British School at Rome, has brought together a multitude of partners in order to share data about amphora and marble distribution along the Mediterranean littoral. With standardised Semantic Web technologies such as RDF/S and OWL now well established, the principal technical challenge lies in making it a simple and intuitive process for non-technicians to map their own datasets to globally unique canonical Uniform Resource Identifiers (URIs). Initial research has created a guided process by which relational data columns can be mapped to concepts within an ontology, and Natural Language Processing used to facilitate the mapping local terms to concepts within different concept schemes. Spatial toponyms are also extracted and integreated using the Geonames webservice . Temporal information is to be introduced in the next phase of development. The production of an XML configuration file means that export from the data set can be regularly exported from the dataset without further manual intervention.
The final part of the presentation will look at the alignment of parallel ontologies and terminologies (taxonomies, thesauri etc.). The etic nature of archaeological classification dictates that comparisons between entities described using different typologies must be both transparent and easily repeatable using different typological alignments. The Linked Data approach creates both flexibility and dangers in this regard when contrasted with those that utilise a single ontology. Some of these will be outlined and the use of SKOS vocabulary  as a means for dealing with multiple concept schemes will be given specific attention.
The seminar will be followed by wine and refreshments.