Exploring the productivity of Homeric formulae through Distributional Semantics

Martina Astrid Rodda and Barbara McGillivray (Alan Turing Institute)

Digital Classicist London seminar 2019

Friday June 14th at 16:30, in room G34, Senate House, Malet Street, London WC1E 7HU

Livecast at Digital Classicist London YouTube channel.

The language of archaic Greek epic is overwhelmingly composed by formulae, linguistic structures which occur repeatedly with a high degree of fixedness. Examples include complex phrases combining a verb with open slots that can accommodate additional information (e.g. ekhon en khersi, ‘holding in one’s hands,’ which takes a direct object).

These formulae behave similarly to multi-word expressions in natural languages (Kiparsky 1976), in that they allow for limited syntactic variation. Describing the exact scope and mechanisms of formulaic variation is notably difficult; computational methods on large-scale digital collections can shed new light into this complex phenomenon.

Recent studies (Bozzone 2014; Antović – Cánovas 2016) have emphasised the parallels between formulae and linguistic constructions, i.e. form-meaning pairs. Research on constructional productivity aims to identify the factors that cause a construction to be more productive than others (Barðdal 2008). Similarly, a suitable model of formulaic behaviour should allow us to evaluate the driving factors in the productivity of a formula.

While Bozzone’s work has drawn attention to the role of syntactic flexibility in driving formulaic change, the contribution of semantics remains entirely unexplored. Perek (2016) proposed a diachronic corpus-based model of productivity in English as driven by semantics: constructions that are more semantically flexible, while still maintaining a certain level of semantic coherence, are more productive.

In our paper, we present the first computational model which uses Distributional Semantics to assess the role of semantic coverage in driving formulaic productivity in ancient Greek epic. We use the Diorisis Ancient Greek corpus (Vatri – McGillivray 2018) to build a vector space model where every word is represented by a vector which encodes information about its linguistic contexts. By comparing the semantic spaces of archaic Greek epic and later poetry, we can detect meaningful trends of diachronic development in formulaic usage and investigate their causes.


