Print Friendly

 

Loos, Lukas, University of Heidelberg, Insitute of Geography, GIScience, Germany, L.Loos@stud.uni-heidelberg.de
Zipf, Alexander, University of Heidelberg, Insitute of Geography, GIScience, Germany, alexander.zipf@geog.uni-heidelberg.de

Introduction

Transcultural (historical) research necessitates the integration of both macro and micro levels of research and analysis and emphasizes the dynamic and interactive character of its research objects in their historical and geographical dimension (‘How Histories make Geographies’ and vice versa). (Appadurai 2011; Döring et al. 2009; White 2010;  Knowles et al. 2008; Owens 2007).

The interdisciplinary, interconnected HGIS-projects located in Heidelberg share a common vision: They facilitate research in the humanities by e.g. (semi-)automatic, it-enhanced processing of information, by adding modes of spatial search in heterogeneous data, by supporting visualization of patterns formed by spatio-temporal data or mash up of data from different sources of a different kind. Thereby they ultimately foster the researcher to formulate new questions and theses and eventually find some answers by experimenting in a sort of ‘humanities lab’. Only close collaboration of geographers, historians and computer scientists can ensure that new digital tools and methodologies do not stay outside the framework of everyday scientific work in the humanities.

In the RIgeo.net project historians and geographers are working together in order to gain the full potential of the analytical capabilities of (historical) geographic information systems ((H)GIS) for historians. The aim of the project is the development of an – ultimately globally connected – infrastructure for the spatio-temporal analysis of historical sources. The combination of historical and geographical data as a part of a ‘GIS-toolbox’ provides a kind of ‘laboratory’ to analyze, recombine and disaggregate (mapped) information encoded in historical evidence, here: the abstracts of the Regesta Imperii1 (hereinafter: RI) combined with a spatial perspective and e.g. rearranged with uploaded findings of the RI researchers.

Data basis

As a starting point for the development of the project we use the online available and in the European context extremely relevant data base RI and cooperate with the team of researchers of the RI, namely Prof. Dr. Paul-Joachim Heinig. The RI are an inventory of 125,000 mostly German abstracts of documents of all ‘German’ Emperors from Charlemagne to Maximilian I. The comprehensive data of the RI provides for a continuous evaluation on the technical and content level. With a view to sustainability and utility maximization the implemented procedures are designed to be applicable for other similar data sets.

Project Objective

The objective is the development of spatio-temporal thesauri of places and to geocode the places of issue of the documents and the places mentioned in the documents. One  objective is hereby is to assist in the research on itinerant kingdoms.

A central question is how to visualize (un-)certainty. One should bear in mind the universal warning that ‘all visualizations of information are abstractions, which provide useful approximations of the real world. […] Visualizations reduces the cognitive weight on the analyst or learner when the quantity of information, both quantitative and qualitative, is great, a problem is complex, and alternative solutions are numerous and surpass the capabilities of human reason’ (Owens 2007).

Depending on the source material there are different challenges to be met to allow the user to judge on the degree of confidence of the visualized data (and thus avoid the danger of representing a higher precision than is justified by the historical resource).

The main difficulties are:

  • Accuracy and precision of the spatial information: The location isn’t just one clearly defined place, but e.g. an diffuse area, a vague offset from a named place (‘near Heidelberg’) and uncertain due to the credibility of the source itself (e.g. a forged medieval document). (Hill 2006).
  • Accuracy and precision of the resolution of the temporal ranges: the ‘temporal footprint’: the beginning and ending dates are always of a certain fuzziness. The resolution of the dates found in historical sources varies widely, e.g. the RI on when Friedrich II. stayed in a place varies from an exact day (16.3.1217) to the notion of a year (1217) without any further information. (Hill 2006).

Implementation

As a first approach we implemented a system that consists of an extract transform and load pipeline (ETL) and a PostgreSQL/PostGIS database with a simple star-schema (see Figure 1). In the dimension tables we store place names, person names, dates and geographic coordinates. Places and dates of issue are being extracted from the XML documents of the RI through Xpath/XQuery queries and stored in the database. In the next step we deploy the GeoTWAIN2 and Nominatim3 web services to geocode the place names. From this we calculate an average traveling distance to derive probable whereabouts of unknown places or places that do not exist anymore. This narrows down the area for a manual search and additionally allows to disambiguate and geocode cases where more than one result was received from the web services.

Figure 1Figure 1: Workflow and system implementation (Source: compiled by the author)

Future Work

One focus of future work lies in the field of text mining and Geographic Information Retrieval (Feldman 2007; Leidner 2007). It can be assumed that the language used in the documents correlates highly with the age of the documents. The system allows to automatically annotate the  place names and person names mentioned in the documents through the connection of the different datasets of the RI. The annotated documents can be used as training and test data which can be applied for a supervised machine learning approach in order to find regularities in the documents. Due to the sequential properties of the data, a dynamic Bayesian model such as state-of-the-art Hidden Markov Models (Baum 1966) or Conditional Random Fields (Lafferty 2001) will be considered for learning.

References

Appadurai, A. (2011). How Histories make Geographies. Transcultural Studies 1. http://archiv.ub.uni-heidelberg.de/ojs/index.php/transcultural/article/view/6129 (accessed 28.10.2011).

Baum, L. E., and T. Petrie (1966). Statistical inference for probabilistic functions of finite state markov chains. The Annals of Mathematical Statistics 37(6): 1554-1563.

Döring, J., and T. Thielmann, eds. (2009). Spatial Turn: Das Raumparadigma in den Kultur- und Sozialwissenschaften. 2nd ed. Bielefeld: Transcript.

Feldman, R. (2007). The text mining handbook – advanced approaches in analyzing unstructured data. Cambridge: Cambridge UP.

Hill, L. L. (2006). Georeferencing: The Geographic Associations of Information. Cambridge, MA: MIT Press, pp. 85-88.

Knowles, A. K., A. Hillier, and R. Balstad (2008). Conclusion: An Agenda for Historical GIS. In: A. K. Knowles and A. Hillier (eds.), Placing History. How Maps, Spatial Data, and GIS are Changing Historical Scholarship. Redlands, CA.: ESRI Press, pp. 267-274.

Lafferty, J., A. McCallum, and F. Pereira (2001). Conditional ramdom fields: Probabilistic models for segmentation and labeling sequence data. ICML, Proceedings of International Conference on Machine Learning, pp. 282-289.

Leidner, J. (2007). Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. Ph.D thesis, School of Informatics, University of Edinburgh, Scotland.

Owens, J. B. (2007). Toward a Geographically-Integrated, Connected World History: Employing Geographic Information Systems (GIS). History Compass 5(6): 2014-2040.

White, R. (2010). What is Spatial History? Spatial History Lab: Working paper. http://www.stanford.edu/group/spatialhistory/cgi-bin/site/pub.php?id=29  (accessed 29.12.2011).

Notes

1.http://www.regesta-imperii.de (accessed 20.01.2012).

2.http://geotwain.uni-hd.de (accessed 01.11.2011).

3.http://wiki.openstreetmap.org/wiki/Nominatim (accessed 01.11.2011). (accessed 01.11.2011).