Integrating archival materials for the study of the turbulent Greek 40s

(ΙΠΣΥ-ΕΚ ΑΘΗΝΑ), Βουρβαχάκη, Κ. (Vourvachaki, K.), Δεσποτίδου, Ι. (Despotidou, I.), Ηλβανίδου, Μ. (Ilvanidou, M.), Κωνσταντόπουλος, Π. (Constantopoulos, P.), Λιακοπούλου, Β. (Liakopoulou, V.), Ντρίτσου, Β. (Dritsou, V.)

Η δημοσίευση έλαβε εύφημη μνεία.

Humanities researchers often need to study heterogeneous digitized archives from different sources. But how can they deal with this heterogeneity, both in terms of structure and semantics? What are the digital tools they can use in order to integrate resources and study them as a whole? And what if they are unfamiliar with the methods and tools available? Towards this end, DARIAH-EU and CLARIN research infrastructures already support researchers in exploiting digital tools. Specific use case research scenarios have also been developed, with the PARTHENOS SSK being a successful example. In this paper we describe our related (ongoing) experience from the development of the Greek research infrastructure APOLLONIS, where, among others, we have focused on identifying and supporting the workflows that researchers need to follow to perform specific research studies while jointly accessing disparate archives. Using the decade of 1940s as a use case, a turbulent period in Greek history due to its significant events (WWII, Occupation, Opposition, Liberation, Civil War), we have assembled (digitized) historical archives, coming from different providers and shedding light on different historical aspects of these events. From the acquisition of the resources to the desired final outcome, we record the workflows of the whole research study, including the initial curation process of the digitized archives, the ingestion, the joint indexing of the data, the generation of semantic graph repesentations and, finally, their publication and searching. After the acquisition of the heterogeneous source materials we perform a detailed investigation of their structure and contents, in order to map the different archive metadata onto a common metadata schema, thus enabling joint indexing and establishing semantic relations among the contents of the archives. The next step is data cleaning, where messy records are cleaned and normalized. Natural Language Processing methods are then exploited for the extraction of additional information contained in the archival records or in free text metadata fields, such as persons, places armed units, dates, and topics, which enhance the initial datasets. The outcome is encoded in XML using the common schema and ingested into a common repository through an aggregator implemented using the MoRE system. A joint index based on a set of basic criteria is generated and maintained, thus ensuring joint access to all archival records regardless of their source. In addition, an RDF reprentation is generated from the encoded archival data enabling their publication in the form of a semantic graph and supporting interesting complex queries. This is based on a specifically designed extension of CIDOC CRM and a compilation of a list of research queries of varying complexity encoded in SPARQL. Preliminary tests of the entire workflows and the tools used in all steps yielded very encouraging results. Our immediate plans include full scale ingestion and indexing of the material from a number of archives, producing the corresponding semantic graph and streamlining the incorporation of new archives.

ΔΥΑΣ

Integrating archival materials for the study of the turbulent Greek 40s

Πρόσφατα άρθρα

Ομάδες εστίασης: Φιλόλογοι/Γλωσσολόγοι, Κοινωνικοί και Πολιτικοί Επιστήμονες

Ομάδες Εστίασης: Επιστήμονες της Πληροφορίας / Επιστήμονες που διαχειρίζονται Γεωχωρικά Δεδομένα

Οι συναντήσεις των ομάδων εστίασης της Ακαδημίας Αθηνών: Ιστορία, Αρχαιολογία, Λαογραφία, Κοινωνική Ανθρωπολογία

Ανάλυση τάσεων στη χρήση ψηφιακών μεθόδων στις ανθρωπιστικές και κοινωνικές επιστήμες

Το Δίκτυο DARIAH-GR/ΔΥΑΣ

Το Δίκτυο CLARIN:EL

Δράσεις σε εξέλιξη

DHt(r)ip

Νήμα