Print Friendly

Shaw, Ryan Benjamin, University of North Carolina at Chapel Hill, USA,

Historical insight is only achieved when the contours of our view of the past are as clear as possible.

(Frank Ankersmit)

Each history provides a unique view of the past by picking out a path through the field of possible events (Veyne 1984: 36). And because histories respond to earlier histories, their paths intersect. At these intersections lie events which become taken for granted as ‘sites to be visited’ on ‘a pre-arranged itinerary marking out the recommended scenic route (and the beaten track) from one major point of interest to the next’ (Rigney 1990: 37). Some of these itineraries are then reified as periods.

If we accept these itineraries too readily, we risk missing opportunities for new insights. Historical insight results not just from the accumulation of new facts about the past, but from the development of new views upon the facts we already know. The more stories we have about some piece of the past, ‘the deeper our insight into it will be […] because only the presence of other stories enables us to draw the contours and to recognize the specificity of the view of the past presented in each one’ (Ankersmit 1983: 219).

In 1983, Frank Ankersmit theorized a procedure for drawing these contours by clustering stories that contain overlapping sets of propositions. By taking a set of stories about a person, place, or period, transforming each story into a list of propositions, and aligning them in such a way that the lists could be compared, Ankersmit posited that ‘certain classificatory patterns [would] automatically appear’ (Ankersmit, 1983: 145).

The Contours of the Past project is testing Ankersmit’s theory by building tools for comparing and contrasting histories of the civil rights movement. The historiography of the civil rights movement exemplifies how periods coalesce around dominant stories. The first generation of civil rights scholars ‘conceived of the civil rights struggle as primarily a political movement that secured legislative and judicial triumphs’ (Lawson 1991). The chronology of this movement began with Brown v. Board of Education of Topeka in 1954 and ended with the Voting Rights Act of 1965. Due in part to intense media coverage, by the end of the twentieth century this chronology had become a well-beaten path not only among scholars but also in popular understanding.

In the past decade, a new generation of scholars has sought to broaden this itinerary, telling ‘the story of a ‘long civil rights movement’ that took root in the liberal and radical milieu of the late 1930s, was intimately tied to the ‘rise and fall of the New Deal Order,’ accelerated during World War II, [and] stretched far beyond the South’ (Hall 2005). The concept of the Long Civil Rights Movement (LCRM) is about more than simply replacing one story with a new one. By widening the scope of the civil rights movement, the LCRM opens space for a greater diversity of stories, enabling greater insight but making it more difficult to grasp the movement as whole. We aim to show how computational analysis might complement the work of scholars evolving new periodizations and perspectives, by providing tools for comprehending narrative patterns in a more complex whole.

We are applying cutting-edge text analysis techniques to two corpuses: eighty-seven books made available by the UNC Press through their Publishing the LCRM project, and transcripts of approximately 350 interviews conducted by the Southern Oral History Program as part of their LCRM initiative.1 The specific techniques we are applying are event parsing and narrative clustering.

Event parsing involves identifying sentences that communicate some event, for example a strike, a protest, a bombing, or a legislative act. Specifically, it involves identifying frames, conceptual structures that describe particular events along with their participants and settings.2 Typically event parsing has been used for detection and tracking of topics in news media, automated question answering, text summarization, and the production of structured data from unstructured text.

We apply event parsing to different ends. We do not treat events as ‘facts’ that can be consumed independently of the histories from which they were ‘extracted.’ Historical knowledge inheres within the narrative form, so the ‘extraction’ metaphor is a poor fit for tools that aim to enhance access to historical knowledge (Shaw 2010). Instead, we use the results of event parsing as features for comparing stories though narrative clustering: treating stories as ‘bags of events’ and applying statistical techniques for grouping together similar ‘bags.’

We envision two forms of comparison. First, along the lines of Ankersmit’s original proposal, we can highlight the specificity of a given history by showing which events it recounts that are not recounted by similar histories. In conjunction with information about the history such as when, where, and by whom it was produced, such comparisons could provide a powerful means of assessing the depth and scope of a given collection of histories.

Second, given a group of histories recounting overlapping sets of events, we can compare their ‘speeds.’ Roland Barthes noted that an event that takes up dozens of pages in one history may be covered by just one in another – a phenomenon he called acceleration.3 Because we propose to bring together histories that recount the same events, and because those events are parsed from the actual texts of those histories, we can potentially compare how different histories speed up and slow down time.

Contours of the Past is comparable to recent projects applying topic modeling to historical sources such as diaries and newspaper archives (Blevins 2011; Nelson). Topic modeling is a statistical technique for discovering independent topics in some collection of documents, where a ‘topic’ is defined as a group of words that tend to appear in the same documents (Blei, Ng and Jordan 2003). Topic modeling has found favor among digital humanists for quickly identifying themes in a large collection of documents without having to specify some set of themes ahead of time.

However, topic modeling as it is usually applied directly to the words used in historical documents will mainly reflect patterns of word usage. This is exactly what is desired for investigations of diction and style, but for identifying common patterns of historical narration, this may not be the best approach. An oral history containing a firsthand account may use language that is very different from that found in a scholarly monograph, even if both sources are describing the ‘same’ event. Yet two sentences may evoke the same semantic frame even if they do not have any words in common. Thus we can potentially find common patterns of historical narration across different kinds of narrative source by applying clustering techniques, not at the surface level of language (the specific words used), but at the level of frame-semantic representation.


Agirre, E., and P. G. Edmonds (2006). Word Sense Disambiguation: Algorithms and Applications. Dordrecht: Springer

Ankersmit, F. R. (1983). Narrative Logic: A Semantic Analysis of the Historian’s Language. The Hague: M. Nijhoff.

Barthes, R. (1981). The Discourse of History. Translated by S. Bann. In E. S. Shaffer (ed.), Comparative Criticism: A Yearbook 3. Cambridge: Cambridge UP.

Blei, D. M., A. Y. Ng, and M. I. Jordan (2003). Latent Dirichlet Allocation The Journal of Machine Learning Research  3: 993

Blevins, C. (2011). Topic Modeling Historical Sources: Analyzing the Diary of Martha Ballard. Proceedings of Digital Humanities 2011. Stanford, CA, 19-22 June 2011.

Gildea, D., and D. Jurafsky (2002). Automatic Labeling of Semantic Roles. Computational Linguistics 28(3): 245

Hall, J. D. (2005). The Long Civil Rights Movement and the Political Uses of the Past. Journal of American History  91(4): 1235

Lawson, S. F. (1991). Freedom Then, Freedom Now: The Historiography of the Civil Rights Movement. The American Historical Review  96(2): 456

Nelson, R. K. Mining the Dispatch. Digital Scholarship Lab, University of Richmond

Rigney, A. (1990). The Rhetoric of Historical Representation: Three Narrative Histories of the French Revolution. Cambridge: Cambridge UP.

Shaw, R. (2010). From Facts to Judgments: Theorizing History for Information Science. Bulletin of the American Society for Information Science and Technology 36(2): 13

Veyne, P. (1984). Writing History: Essay on Epistemology. Translated by M. Moore-Rinvolucri. Middletown, Connecticut: Wesleyan UP.


1.See the Publishing the LCRM project at and the LCRM Initiative at

2.More specifically, it involves two tasks: first identifying the frames invoked by particular words in a text (a form of word sense disambiguation) and then assigning entities to the various roles in each frame, known as semantic role labeling. For an overview of the former, see Agirre and Edmonds (2006). For the latter, see Gildea & Jurafsky (2002).

3.Barthes (1981: 9) hypothesized that ‘the nearer we are to the historian’s own time […] the slower the history becomes.’