Print Friendly

Alexander, Marc, University of Glasgow, UK,
Kay, Christian, University of Glasgow, UK,

This poster examines the phenomenon of sound symbolism, or phonaesthesia, in the history of English. Using digital humanities techniques applied to the database of the Historical Thesaurus of English (Kay et al. 2009; hereafter abbreviated to HT), it presents numerical evidence for the existence of five particular sound-clusters which carry specific meanings in English, and contrasts these with traditional analyses carried out manually in previous years by other scholars. The poster presents some word-initial phonaestheme clusters alongside examples of concepts realised by that cluster, alongside figures showing how many words in that conceptual category are phonaesthetically influenced.


The general topic under study is the claim that certain sound-clusters in English convey independent meanings, and that these clusters influence the meaning of those words that contain them (see, inter alia, Hinton et al. 1994; Kay & Wotherspoon 2002; Reay 1991, 2006). For example, the cluster <sw-> is often mentioned as occuring at the start of words indicating movement through air (swoop, sweep, swoosh, swash, swat) in an imitative fashion, meaning that new words starting with that cluster will be affected by its meaning. As Michael Samuels states, the ‘validity of a phonestheme is, in the first instance, contextual only: if it fits the meaning of the word in which it occurs, it reinforces the meaning, and conversely, the more words in which this occurs, the more its own meaning is strengthened’ (Samuels 1972: 46).

This phenomenon is commonly discussed in lexical semantics and lexicography, and while there are theoretical arguments for and against its strength of effect in English, it is ripe for an empirical and digital investigation. Previous studies have relied on dictionary data, or on an analyst’s own introspection. A new style of investigation has recently been made possible, however, by the completion of the HT, from which we take our data.

The Data

The HT, published in 2009, is the world’s largest thesaurus and the most complete thesaurus of English, arranging into hierarchical semantic categories all 800,000 recorded meanings expressed in the language from Anglo-Saxon times to the present. These are put into very fine-grained semantic categories, specifying precisely the word’s sense alongside attestation dates for that meaning. As an example, the word broadsword is recorded as being a type of sword and so is within that particular category (a category which exists seven layers of hierarchy down into the HT taxonomy). Moving upwards, all the words for swords, knives, daggers, etc exist within the larger category of side-arms, itself within the category of a sharp weapon (adjacent to club/stick and other blunt weapons), which is a sub-type of weapon, which is a form of military equipment, used in the enactment of armed hostility, which is a phenomenon which arises from society, one of the three top-level categories of the HT. All the recorded words in English are arranged in this way, permitting a fine-grained look at the relationship between word form and meaning. This makes the HT database ideal for a study of this type.


The underlying HT database (see Kay & Chase 1987; Wotherspoon 1992, 2010), held at the University of Glasgow, is therefore a massive computational resource for analyzing the recorded words of English. The present paper presents data derived from a Python program, written by the authors, which searches through the HT’s word forms and produces data categorizing all word-initial consonant clusters according to their HT concept category, along with some figures about the size of that category and how many words it contains beginning with each consonant cluster. A combination of statistical filtering and manual analysis then resulted in a large set of English initial phonaesthemes; similar work can be undertaken in future on word-final phonaesthemes.

This set of data was then ranked in order of the putative strength of the phonaesthetic linkage (that is, the statistical preponderance in categories of a significant size). Five particular clusters were then identified as being of sufficient significance to be likely candidates as phonaesthemes:

<wr->, <gr->, <sl->, <st-> and <fl->.

The poster goes on to give examples of each throughout time, alongside figures of how many words make up the relevant semantic categories to which the clusters belong. For example, the <wr-> cluster is particularly associated with uncomfortable movement. It consists of 48% of the 145 words in the HT meaning a twisting movement (writhing, wrenching, wresting, wringing, wreathing, wrying, writher, wriggle, wrinkle, etc) and 15% of the 163 words meaning wrestling (including wrestling, wraxling, wrestle, wristle, and the dialectal warsle). It also appears in other related categories (13% of anger and 13% of misery, with words related to wrath and wretchedness), and in a significant metaphorical extension of the twisting sense above (15.1% of distortion or perversion of meaning).


By providing evidence of the sort outlined above, this poster describes data derived from applying digital humanities techniques to a new dataset for the study of English. It gives evidence which permits an empirical approach to old questions in linguistic theory, allowing us to move towards an analysis of phonaesthesia which focuses on what percentage of a given concept is realised by words beginning with a particular sound-cluster, and how this relates to similar occurrences in neighbouring semantic concepts.

Beyond the present poster, the diachronic dimension of the HT also allows us to plan future DH work examining the historical development of these patterns, and to link them to historical corpora – something which space does not allow on this poster. Such further research in this area would be able to use further datasets to address and investigate those instances across the history of English where phonaesthemes ‘grow from minor coincidental identification between a few roots to much larger patterns’ (Samuels 1972: 47).


Hinton, L., J. Nichols, and J. J. Ohala, eds. (1994). Sound Symbolism. Cambridge: Cambridge UP.

Kay, C., and T. J. P. Chase (1987). Constructing a Thesaurus Database. Literary and Linguistic Computing 2(3): 161-163.

Kay, C., and I. Wotherspoon (2002). Wreak, wrack, rack, and (w)ruin: the History of Some Confused Spellings. In T. Fanego, B. Mendez-Naya, and E. Seoane (eds.), Sounds, Words, Texts and Change: Papers from 11 ICEHL. Amsterdam: Benjamins, pp. 129-143.

Kay, C., J. Roberts, M. Samuels, and I. Wotherspoon (2009). Historical Thesaurus of the Oxford English Dictionary. Oxford: Oxford UP.

Reay, I. E. (1991). A Lexical Analysis of Metaphor and Phonaestheme. Ph.D. thesis: University of Glasgow.

Reay, I. E. (2006). Sound symbolism. In K. Brown (ed.), Encyclopedia of Language and Linguistics. Oxford: Elsevier, vol. 11, pp. 531–539.

Samuels, M. (1972). Linguistic Evolution. Cambridge: Cambridge UP.

Wotherspoon, I. (1992). Historical Thesaurus Database Using Ingres. Literary and Linguistic Computing 7(4): 218-225.

Wotherspoon, I. (2010). The Making of The Historical Thesaurus of the Oxford English Dictionary. In M. Adams (ed.), ‘Cunning passages, contrived corridors’: Unexpected Essays in the History of Lexicography. Monza: Polimetrica, pp. 271-287.