Print Friendly

Earhart, Amy, Texas A&M University, USA,

This paper examines the state of the current digital humanities canon, provides a historical overview of the decline of early digitally recovered texts designed to expand the literary canon, and offers suggestions for ways that the field might work toward expansion of the digital canon. The early wave of small recovery projects has slowed and, even more troubling, the extant projects have begun to disappear, as is apparent from looking through the vast number of projects that are if we are lucky, ghosts on the wayback machine. Alan Liu’s Voice of the Shuttle provides a good measure of the huge number of lost early recovery projects. A quick perusal of ‘The Minority Studies’ section reveals that of the six sites listed in ‘General Resources in Minority Literature,’ half cannot be located with a careful search, suggesting that they have been removed.  The same trend is found with other projects listed on the site. While the ‘General Resources in Chicano/Latino Literature’ section, only 50% of projects are still online, other areas, such as Asian American literature, have a higher percentage of active projects. Some projects, such as Judith Fetterley’s 19th Century Women’s Bibliography, are the living dead and have not been updated for years. Similarities exist among the extinct projects. Most were produced in the late 1990s by single scholars at institutions that did not have an etext center or a digital humanities center, never attracted external support, and never upgraded to SGML or TEI. The canon problems are driven by the dual issues of preservation and the stagnation of digitization recovery efforts.

We should find it troubling that the digital canon is losing the very texts that mirrored the revised literary canon of the 1980s. If we lose a large volume of these texts, and traditional texts such as Whitman, Rossetti, and Shakespeare are the highlighted digital literary texts, we will be returning to a new critical canon that is incompatible with current understandings of literature. The early digital recovery work I am discussing is not easily available. Few of these texts are in print. A few more are available on for-profit databases or on microfilm, but most are available only with a return to the one or two libraries that own the original physical copy of the book, journal, or newspaper.

Some have posited that a structural problem in the emergence of digital humanities contributes to the selection of materials for digitization and preservation. Martha Nell Smith contends that digital humanities developed as a space to which practitioners hoped to flee from the shifts in the profession that arose out of the cultural studies movement. Smith highlights the digital humanities’ retreat into modes of analytics, objective approaches as ‘safe’ alternatives to the messy fluidities found in literary studies. If Smith is correct, then there should be no surprise that recovery of messy lost texts has not been a priority for the field. Others, such as Kenneth Price, view the funding mechanism as a contributing factor to the increasingly traditional canon of digitized texts. Price notes that the criteria of impact, crucial to the granting decision, favor the canonical text. The National Endowment of Humanities (NEH) awarded 141 start-up grants from 2007 through 2010. Of those grants, only twenty-nine were focused on diverse communities and sixteen on the preservation or recovery of diverse community texts. It is striking to examine the authors and historical figures individually cited in the list of funding: Shakespeare, Petrach, Melville, and Jefferson. While there are grants to support work on indigenous populations, African and African American materials, and Asian American materials, in addition to other populations, the funding of named great men of history deserve scrutiny.

In addition, the turn to increased standardization (TEI) and big data troubles our efforts at small-scale recovery project, as DIY scholars, outside the DH community, have difficulty gaining access to required technical skills for small projects, leading to a decline in small-scale digital recovery projects. The poor literary data sets impact digital humanities efforts to experiment with visualization and data mining techniques. For example, the UC Berkley WordSeer tool is remarkably useful, but the data set used to test the tool and the conclusions drawn about the texts are problematic. The team chose to examine the slave narratives housed at Documenting the American South. They ran an analysis on the set to see if the literary conventions of the text corresponded with critic James Olney’s claim that autobiographical slave narratives follow a list of tropes such as ‘I was born’ or ‘Cruel slavemaster.’ However, the chosen data set was not appropriate for the research question. The 300 narratives labeled slave narratives at this site are actually a mixed bag of first person narratives that are fictional and non-fictional, black authored and white authored, pre and post Civil War, some autobiographical, some biographical and some anti and some pro slavery, or at least apologetic. Given the narratives, it is hard to believe that Olney’s criteria, which he says are to be applied to autobiography, would be able to be proven by the test set of data. Literary datamining is still in its infancy, but, to date, the most successful literary results are those that utilize smaller, curated data sets, such as Tanya Clement’s fascinating Stein project.

Dan Cohen writes, ‘Instead of worrying about long-term preservation, most of us should focus on acquiring the materials in jeopardy in the first place and on shorter-term preservation horizons, 5 to 10 years, through well-known and effective techniques such as frequent backups stored in multiple locations and transferring files regularly to new storage media, such as from aging floppy discs to DVD-ROMs. If we do not have the artifacts to begin with, we will never be able to transfer them to one of the more permanent digital archives being created by the technologists.’ I want to spin out the two crucial points that I take from Dan’s argument. 1) we must continue to focus on acquiring artifacts and 2) we must work with short term preservation strategies to stop immediate loss.

If Matt Kirschenbaum is correct and preservation is not a technical problem but a social problem, then it follows that the digital humanities community should be able to address the lack of diversity in the digital canon by attention to acquisition and preservation of particular types of texts.  We need a renewed effort in digitizing texts that occurs in tandem with experimental approaches in data mining, visualization and geospatial representations. Recent critiques have tended to separate the two, a mistake that will damage both efforts. We can create infrastructure to support scholars interested in digital recovery. Leveraging existing structures to support new, small scale scholarship, continued outreach and training for scholars not interested in become digital humanists but interested in digital recovery is crucial.

Preservation of existing digital recovery projects needs to begin immediately. We need a targeted triage focused on digital recovery projects that allow access to materials that are not accessible in print or other digital form. Using existing structures of support, like NEH’s Access and Preservation Grant and Brown’s newly launched TAPAS, we might build an infrastructure to ingest selected digital recovery projects. While it is unlikely that we can preserve the entirety of the project we might, as Dan Cohen has argued, preserve simply. We may not preserve the interface, but if we can preserve the recovery core – the text – then we will not lose the valuable work that the early digital pioneers completed.

We need to examine the canon that we, as digital humanists, are constructing, a canon that skews toward traditional texts and excludes crucial work by women, people of color, and the GLBTQ community. We need to reinvigorate the spirit of previous scholars who believed that textual recovery was crucial to their work, who saw the digital as a way to enact changes in the canon. If, as Jerome McGann suggests, ‘the entirety of our cultural inheritance will be transformed and reedited in digital forms,’ (72), then we must ensure that our representation of culture does not exclude work by diverse peoples.


Clement, T. E. (2008). ‘A Thing Not Beginning and Not Ending’: Using Digital Tools to Distant-read Gertrude Stein’s The Making of Americans. Literary and Linguistic Computing 23(3): 361-381.

Cohen, D. J. (2005). The Future of Preserving the Past. CRM: The Journal of Heritage Stewardship 2(2): viewpoint.

Cole, J. L. (2004). Newly Recovered Works by Onoto Watanna (Winnifred Eaton): A Prospectus and Checklist. Legacy: A Journal of American Women Writers 21(2): 229-234.

Cole, J. L. Winnifred Eaton Digital Archive. Formerly

Liu, A. Voice of the Shuttle

McGann, J. (2005). Culture and Technology: The Way we Live Now, What is to be Done? New Literary History 36(1): 71-82.

Olney, R. (1984). I Was Born: Slave Narratives, Their Status as Autobiography and as Literature. Callaloo (20): 46-73.

Price, K. M. (2009). Digital Scholarship, Economics, and the American Literary Canon. Literature Compass 6(2): 274-290.

Smith, M. N. (2007) The Human Touch: Software of the Highest Order: Revisiting Editing as Interpretation. Textual Cultures 2(1): 1-15.

University of Virginia Library. Electronic Text Center ttp://

Wordseer. The University of Berkley, California