No source: created in electronic format.
Detecting and marking consistently through a folktale the participants that are playing a role in the story can help a lot in assigning in an automatic fashion the typical functions to characters, as those are for example described by (Propp 1968), and we equally expect that the Proppian narrative functions can also be better automatically detected and marked-up in text, if an accurate recognition of the main participants in the story has been performed beforehand. (Lendvai et al. 2010) addresses the issue of semi-automatically assigning Proppian characters and action types to text segments mainly on the base of linguistic analysis.
In this poster/demo article, we describe a complementary approach, which relies
first on a knowledge base, in the form of an ontology formalizing family
relationships, which is getting populated by iterative applications of the
ontology components to a linguistically annotated tale, whereas different
natural language expressions referring to an unique character are marked in the
iteratively updated knowledge base using the OWL
The current focus on the family ontology is guided by the fact that family relationships play a central role in many tales. Modeling other participants in tales is much more difficult, since their behavior very often do not correspond to the ‘normal’ entities (so for example a speaking ‘river of milk’, which acts as a ‘helper’ in the tale). Nevertheless our approach allows detecting also such entities as characters.
A problematic issue in processing folktales is the detection and corresponding annotation of co-referring expressions in text. Folktales are particular in this respect, since people (or characters) are relatively rarely mentioned by name, but are prevalently introduced by their function ("the King"), family status (‘the father’) or their mere existence (‘there lived a woman’). This phenomenon, together with very vague contextual spatio-temporal descriptions in text, makes the recognition of co-referring expressions on the basis of mere linguistic features quite cumbersome. This is a reason why we developed the family ontology, in order to support knowledge-based reference resolution of entities detected in text. On the basis of this semantic resource one can store as a specific individual in the knowledge base each entity of the tales that has been associated with a particular biological or family status.
For the purpose of the knowledge-based reference resolution, we equipped the
class hierarchy with a set of inference rules, which are acting in a
complementary fashion to the Protégé built-in Pellet reasonerMan and Woman are encoded as instances of the
class Parents, and therefore are identical to instances of
the classes Father and Mother in case
enough evidence about the marital or biological status is given by the text.
Different instances of the class Children are at the same
time instances of the class Siblings, using similar
heuristics as for the class Parents, so that family
relationships extracted from text can be completed by the inference rules, and
made available for the incremental analysis of the text.
Every class and relation encoded in the ontology is associated with a label in
natural language (in four languages: English, German, Russian and Bulgarian)
We process with NooJ the whole tale and mark especially all nominal phrases
(NPs), being simple (‘The mother’), coordinated (‘a old man and a old woman’) or
recursive (‘a river of milk flowing in banks of pudding’) NPs
Our textual analysis is further specifying if an NP is indefinite or definite on
the basis of the determiners used (‘a woman’ – indefinite – vs ‘the mother’ –
definite –), at least for languages using this kind of determiners, like
English, German, etc. 2000) is giving a good overview of the past discussions.
In the special case of tales (Herman 2000) is providing for examples supporting
this view on indefinite nominal phrases, relating them to the introduction of
characters of tales. Our actual work with NooJ is implementing some of the views
described in the work of Herman. The first step of our iterative approach to
text analysis is resulting in the linguistic annotation of the folktale in terms
of indefinite and definite NPs.
The next iteration is dealing then with the application of the knowledge base to
the indefinite NPs in the text. The main elements of the indefinite NPs – the
nouns – are extracted and compared with the labels of the classes in the
ontology. So the noun ‘daughter’ within an indefinite NP in the tale is matching
the label of the class Daughter of the family ontology. As
a consequence, this noun is stored in the knowledge base as a potential
character of the tale and gets the ID ‘ch3’ (since before this the program has
identified ‘man’ and ‘wife’ as the first potential characters occurring in the
text), marking it as an individual of the class Daughter.
This procedure is applied to all indefinite NPs occurring in the tale.
We apply then the inference rules described above in Section 2 to the candidate
characters stored in the knowledge base. Just to give a simple example: ‘ch3’
(‘daughter’) is being automatically encoded in the ontology as an instance of
the classes Girl and Sister, while the
relationships to the brother is also automatically inferred. These inferences
can be draw also du to the fact that after the first iteration, it appeared that
the tale is mentioning only one young female person and only one young person.
This iteration offers thus also a kind of consolidation of the results of the
preceding ontology population procedure.
In Figure 2 in the Appendix, the reader can see that our approach manage to map the ‘ch3’ (resulting from the indefinite NP ‘their daughter’) with occurrences of the string ‘girl’ and ‘sister’ occurring in definite NPs elsewhere in the text. This step is for sure benefiting from the results of the application of the inference rules described in Section 4. We apply further a filtering procedure: candidate characters that are mentioned only once in the text (not being matched to the content of definite NPs, for example, or not being involved as agent in an event) are deleted from the knowledge base. On this basis we can eliminate the string ‘a handkerchief’ from the list of potential characters (as an indefinite NPs), but we can keep the string ‘an apple tree’ and consolidate the core element ‘apple tree’ as a character of the tale, since it occurs also in the context of a definite NP, and it is involved in an agentive action (speaking).
We demonstrate the potential benefits of the combined use of an ontology,
inference rules and textual analysis for identifying characters in the
relatively small (and closed) world of a folktale. While first results of our
on-going work are promising, we still have to apply the approach to more tales,
in other languages, and to evaluate our approach. We plan to use for this
purpose the UMIREC Corpus
Acknowledgements
The work reported in this paper has been partly supported by the R&D project ‘Monnet’, which is co-funded by the European Union under Grant No. 248458.
Appendix
Geist, L. (2008). Specificity as referential
anchoring: evidence from Russian.
Proceedings of SuB12, Oslo: ILOS 2008, 151-164.
Herman, D. (2000). Pragmatic constraints on narrative
processing: Actants and anaphora resolution in a corpus of North Carolina ghost
stories. Journal of Pragmatics 32(7): 959-1001.
Lendvai, P., T. Váradi, S. Darányi, and T. Declerck (2010).
Assignment of Character and Action Types in Folk Tales. Proceedings of the NooJ 2010 Conference.
McCrae, J., L. Aguado-de-Cea, P. Buitelaar, P.
Cimiano, T. Declerck, A. Gomez-Perez, J. Gracia, L. Hollink, E. Montiel-Ponsoda,
and D. Spohr (2012). Interchanging lexical resources on the Semantic Web.
Journal on Language Resources and Evaluation (in
Press).
Propp, V. J. (1968). Morphology of the
folktale. Austin: U of Texas P.
Silberztein, M. (2003). Nooj Manual. http://www.nooj4nlp.net.
Thompson, S. (1955). Motif-index of folk-literature: A
classification of narrative elements in folktales, ballads, myths, fables,
medieval romances, exempla, fabliaux, jest-books, and local legends. Revised and enlarged edition. Bloomington: Indiana UP,
1955-58
Tuffield, M. M., D. E. Millard, and N. R. Shadbolt (2006).
Ontological Approaches to Modelling Narrative. 2nd AKT DTA
Symposium, January 2006, AKT, Aberdeen University.
Uther, H.-J. (2004). The Types of
International Folktales: A Classification and Bibliography. Based on the
system of Antti Aarne and Stith Thompson. FF Communications no.
284-286. Helsinki: Suomalainen Tiedeakatemia, 2004.
Zöllner-Weber, A. (2008). Noctua literaria : a
computer-aided approach for the formal description of literary characters using
an ontology. Ph.D. Thesis, Bielefeld University.