Fragmentary Texts and Digital Collections of Fragmentary Authors

Monica Berti (Roma Tor Vergata) and Marco Büchler (Leipzig)

Digital Classicist and Institute of Classical Studies Seminar 2010

Friday July 30th at 16:30, in room STB9, Senate House, Malet Street, London WC1E 7HU

The term fragment is applicable to a wide range of ancient evidence, which includes archaeological ruins, epigraphical and papyrological documents, and many other pieces of the material record. By “fragmentary texts” we mean not only material remains of ancient writings, but also quotations of lost texts preserved through other texts. A huge number of quotations of lost texts has been gathered in print collections, enabling scholars to reconstruct lost works and depict the personality of fragmentary authors.

Information technologies and hypertextual models permit the expression of every element of print conventions, thus building a cyberinfrastructure for new digital collections of ancient sources. Representing textual fragments first involves focusing on the complex relation between the fragment and its source of transmission, given that a quotation is only a shadow of the original text. Consequently, encoding fragments is ultimately the result of interpreting them, and this involves developing a language for representing every element of their textual features, thus creating meta-information through an accurate and elaborate semantic markup. Editing fragments signifies producing meta-editions that are different from printed ones, because they consist not only of isolated quotations but also of pointers to the original contexts from which the fragments have been extracted.

Moreover, the automatic and unsupervised detection of fragmentary authors is one of the most challenging tasks in the field of Natural Language Processing. Even if computational models developed from the knowledge and skills of classicists – based on observations in texts - can be trained faster, the overall quality will be not comparable to the level of classicists in the next years. For this reason we separate the field of collecting fragmentary authors into 4 working areas to support the work of classicists:

  • Associations between author and work names: This kind of an association graph supports tasks such finding all authors that have written works with the same or similar names.
  • Extraction of fragments of an author: Based on different patterns, text fragments are aligned to a fragmentary author whenever this author or his work is mentioned in the text.
  • Finding new quotations and parallel texts: Given such extracted fragments, additional quotations and parallel texts are determined.
  • Expansion of the fragments' set: The use of all the extracted fragments, their quotations and their parallel texts, allows us to determine the semantic space or spaces of an author in order to find new possible fragment candidates of the same space.

During the Digital Classicist seminar two of these four working areas (whichever have made the best progress by the time of the presentation) will be explained in detail. From a more general view, it will be shown how the objective and quantitative methods of computer scientists can be combined with the qualitative in-depth working methodologies of classicists in this purely non-funding collaboration in order to bring benefits to both communities.


The seminar will be followed by wine and refreshments.

Audio recording of seminar (MP3) (please note that due to a software error approximately two minutes of this seminar were not recorded. This gap, about 50 minutes in, is represented by a three-second silence in the recording.)

Presentation (PDF)