HdtDep: a treebank and search engine for Greek word order study

Alessandro Vatri (Oxford University)

Institute of Classical Studies Digital Seminar 2011

Friday June 24th at 16:30, in Court Room, Senate House, Malet Street, London WC1E 7HU

Corpora of syntactically annotated texts (treebanks) are increasingly being developed and used for linguistic study of classical languages. Projects such as Perseus’ Ancient Greek and Latin Dependency Treebanks and PROIEL already offer a large number of tagged Ancient Greek and Latin texts, which can be queried through third-party software or built-in search tools. Both projects apply dependency grammar as a theoretical framework for syntactic annotation, in the style of the Prague Dependency Treebank. Dependency theory is a flexible and powerful framework, which allows data encoding and retrieving independently from specific syntactic approaches. Moreover, it also provides an intuitive way of describing linguistic structures for an audience of classicists lacking training in general linguistics.

In particular, searchable treebanks are valuable tools for the study of word order. As far as Ancient Greek is concerned, classicist have traditionally used Herodotus’ Histories as a corpus for this purpose, on the assumption that this sizeable prose work is the closest representative of ‘natural language’ we can get (K. J. Dover, Greek Word Order, Cambridge 1960: 11; H. Dik, Word Order in Ancient Greek: A Pragmatic Account of Word Order Variation in Herodotus, Amsterdam 1995: 4–5, S. J. Bakker, The Noun Phrase in Ancient Greek. A Functional Analysis of the Order and Articulation of NP Constituents in Herodotus, Leiden – Boston 2009: 2–3). Dependency annotation of Herodotus is being carried out by the PROIEL project (whereas Perseus’ annotated corpus only consists of poetry), but is quite far from being complete. Moreover, the PROIEL website does not provide the possibility to search for specific dependency patterns (only full-sentence view is available).

In the present paper, I would like to describe a small project I have started on the purpose of filling this gap and providing general classicists with an intuitive study tool for word order in Ancient Greek. The HdtDep project consists of a corpus containing the first book of Herodotus, which has been annotated following an adapted and simplified version of Igor Mel'čuk's dependency theory (I. Mel'čuk, Dependency Syntax: Theory and Practice, Albany 1988; I. Mel'čuk, ‘Dependency in Natural Language’, in A. Polguère & I. Mel'čuk, Dependency in Linguistic Description, Amsterdam – Philadelphia 2009: 1-110; A. Vatri 2011 [in preparation]). Compared to the Perseus treebanks, HdtDep only encodes the information that seemed strictly necessary for word order studies (neither morphological data nor syntactic relationship types are encoded), but provides a powerful and userfriendly search engine, which allows searching for precise dependency patterns involving specific grammatical categories or lexemes in exact sequences through a graphic interface. A demo version of this tool, limited to the first chapter of the first book, is available at http://www.crs.rm.it/hdtdep/demo.asp.


Audio recording of seminar (MP3)

Presentation (PDF)