Print Friendly

Varfolomeyev, Aleksey, Petrozavodsk State University, Russian Federation, avarf@psu.karelia.ru
Ivanovs, Aleksandrs, Daugavpils University, Latvia, aleksandrs.ivanovs@du.lv

The term ‘semantic publication’ denotes an electronic text publication that is provided with additional information layers, which represent knowledge about the text in a formalized way suitable for automatic processing. In the modern Web environments, semantic publications, especially in digital libraries and electronic journals, have become quite topical (Baruzzo et al. 2009; Shotton et al. 2009; de Waard 2010). Their advantages are rather obvious. Firstly, semantic publications provide better facilities for searching for information, since such publications draw together search algorithms used by humans and computers. For instance, if in an ontology is mentioned that a definite term has a synonym, searching for the term and its synonym can be performed simultaneously; and it is not necessary to mention the synonym in the request. Secondly, since formalized knowledge can generate new knowledge, semantic publications can be used as a knowledge base in order to advance hypotheses for further research by means of automatic inference.

It seems that semantic publications of historical records have one more advantage different from the above-mentioned common advantages: semantic layers in the publications of historical records reflect researchers’ interpretations, which can be verified by means of formalized, computer-based procedures.

In order to reveal the advantages of semantic editions in research of medieval documentary records, this paper presents a prototype of the semantic publication of the 13th century Old Russian charter corpus, which forms a constituent part of the vast collection of medieval and early modern records ‘Moscowitica–Ruthenica’ kept in the Latvian State Historical Archives, a structural unit of the Latvian National Archives (Riga, Latvia). This collection of documents provides historians with firsthand information about relations of Old Russian and Byelorussian lands and towns (Smolensk, Novgorod, Pskov, Polotsk, etc.), as well as Lithuania (later – Poland-Lithuania) with Riga, Livonia, Hanseatic League and some German towns in the late 12th – early 17th centuries. For the first time, ‘Moscowitica–Ruthenica’ – a historical name of this document collection – was mentioned in the archival inventories dated back to the 1630s. Although ‘Moscowitica–Ruthenica’ as a department of the Latvian State Historical Archives does not exist any more, its documents constitute a natural complex of historical records, which should be studied as a whole (Ivanovs & Varfolomeyev 2005).

In the prototype of the semantic publication, five interconnected charters have been used (Charters nos. 1, 3a, 4, 5, and 6, see Ivanov & Kuznetsov 2009). Actually, there should be mentioned twelve charters that reveal the course of relations between Riga and Smolensk in the 13th – first half of the 14th century, however, the presentation has its limits, therefore the basis of the prototype has been reduced. In the centre of the semantic network represented in the prototype, there is the Missive of Archbishop of Riga Johann II to Fedor Rostislavich, Prince of Smolensk, blaming inhabitants of Vitebsk for unjustified complaint against Rigans (Charter no. 6, 1285–1287).

In order to provide the texts with additional descriptive metadata, information about persons, sites, documents, etc. mentioned in the charters is revealed and linked with the corresponding data extracted from different specialized ontologies. In the last years, a great number of different ontologies have been created, including those intended for historical and source studies, e.g. ontology CIDOC CRM. In this ontology, there are classes and relations that can be used in description of historical persons, sites, and historical events, which are related to museum objects (Doerr 2003). On the basis of this ontology, a number of specialized ontologies for description of definite historical aspects have been elaborated. There can be mentioned ontologies created within Pearl Harbor Project in the USA (Ide & Woolner 2007) or CultureSampo Project in Finland (Ahonen & Hyvönen 2009). Unfortunately, such ontologies can not meet all the requirements of the semantic publication of charter corpora, since they do not reflect the specificity of written historical records. Therefore, the authors of the paper propose a document-oriented approach to creation of ontologies (in contrast with event-oriented approaches accepted in the above-mentioned ontologies).

The semantic publication of the charters constitutes two kinds of semantic links. First, there are links between historically and thematically interconnected charters (these interconnections emerged when the charters were drawn up, in the course of documenting of relations between Smolensk and Riga in the 13th century). Thus, diverse links between information reflected in the charters within this complex of historical records can be revealed. Second, there are links with other historical records, which do not belong to the complex ‘Moscowitica–Ruthenica’. In this case, information extracted from the complex of charters ‘Moscowitica–Ruthenica’ (it can be called ‘internal information’) is linked with ‘external’ information, provided either directly (by other historical records), or indirectly (by research papers, specialized ontologies, etc.).

Within the semantic publication, relations between its objects are described using triplets: ‘a charter is written by a person’, ‘a charter is sent to a person’, ‘a charter mentions a person’. As it is commonly done in different ontologies, inverse relations can also be introduced, e.g. ‘a person is mentioned in a charter’. It should be noted that this publication is partly based on hypothetical data; hypothetic nature of some relations may be reflected using definite combinations of words (‘probably refers to’ instead of ‘refers to’).

However, production of semantic publications on the basis of ontologies, which are recorded using Semantic Web technologies – RDF or OWL, is time-consuming. It seems that opportunities and tools provided by semantic Wiki-systems can facilitate this process. For instance, Semantic MediaWiki (Krötzsch et al. 2007) offers specialized, rather simple markup tools that can be used to indicate different objects (place-names, persons’ names, etc.) in the texts of the charters and to supply the texts with meta-information. The principle feature of this system is the use of typified hyperlinks between pages. These pages constitute the objects of the semantic network, but hyperlinks – denote relations between the objects.

On the site http://histdocs.referata.com, the text of the Charter no. 6 is presented. The text has been translated from Old Russian into English and published applying Semantic MediaWiki. In the published text, hyperlinks to other pages of this semantic network have been marked out. These pages contain texts of the charters of the complex ‘Moscowitica–Ruthenica’, as well as data related to historical persons, places, etc. Below the text, the facts related to a definite charter are mentioned (e.g. ‘Charter 6 mentions Helmich’, ‘Charter 6 probably refers to Charter 4’, etc.) These facts are linked with the text, and this linkage is based on researcher’s interpretation of the document.

It should be noted that within the semantic network different facts about the objects, which are recorded by means of Semantic MediaWiki tools, can be automatically transformed into RDF triplets. Therefore, Wiki-systems can be used for production of semantic publications of charters and other written documents.

However, some shortcomings of Wiki-systems can be mentioned. For example, non-standard fonts can not be used in transcription of the texts; the texts of the charters can not be linked with raster images of the documents; the texts can not be marked up on the basis of XML markup standard (e.g. in accordance with TEI or CEI markup schemes). Therefore, a specialized Wiki-system for charters’ editing purposes should be developed. In the presentation of the paper, some possible solutions to the problems mentioned above are examined.

References

Ahonen, E., and E. Hyvönen (2009). Publishing Historical Texts on the Semantic Web – A Case Study. Proceedings of the Third IEEE International Conference on Semantic Computing (ICSC2009). Berkeley, pp. 167-73.

Baruzzo, A., et al. (2009). Toward Semantic Digital Libraries: Exploiting Web2.0 and Semantic Services in Cultural Heritage Journal of Digital Information 10( 6). http://journals.tdl.org/jodi/article/viewArticle/688/576 (accessed 14 March 2012).

de Waard, A. (2010). From Proteins to Fairytales: Directions in Semantic Publishing. IEEE Intelligent Systems 25( 2): 83-88.

Doerr, M. (2003). The CIDOC Conceptual Reference Module: An Ontological Approach to Semantic Interoperability of Metadata AI Magazine 24( 3): 75-92.

Ide, N., and D. Woolner (2007). Historical Ontologies. In K. Ahmad, C. Brewster and M. Stevenson (eds.), Words and Intelligence II: Essays in Honor of Yorick Wilks. Dordrecht: Springer, pp. 137-52.

Ivanov, A., and A. Kuznetsov (2009). Smolensko-rizhskie akty, XIIIv. – pervaia polovina XIVv.: Dokumenty kompleksa Moscowitica–Ruthenica ob otnosheniiakh Smolenska i Rigi [Treaties between Smolensk and Riga: 13th – First Half of the 14th Century: Documents of the Complex Moscowitica–Ruthenica about Relations between Smolensk and Riga]. Riga.

Ivanovs, A., and A. Varfolomeyev (2005). Editing and Exploratory Analysis of Medieval Documents by Means of XML Technologies. In Humanities, Computers and Cultural Heritage. Amsterdam: KNAW, pp. 155-60.

Krötzsch, M., et al. (2007). Semantic Wikipedia. Journal of Web Semantics 5( 4): 251–61.

Shotton, D., et al. (2009). Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article. PLoS Computational Biology 5( 4): e1000361. http://dx.doi.org/10.1371/journal.pcbi.1000361 (accessed 14 March 2012).