The Corpus and the dictionary entry
By systematically analysing the corpus data we can make discoveries which are then used to update and improve dictionary entries, in order to produce the most accurate description of the language possible.
New words are the most obvious manifestation of language change. But we are also looking for more subtle changes in language – new meanings of existing words, for example, or changes in spelling and hyphenation over a longer period of time, or even grammatical changes.
Here are a few examples showing how the Oxford English Corpus has been used to identify new uses and meanings in the language, and to change dictionary entries as a result.
Until recently edgy was a word with a single meaning; a typical definition would have looked like this one, taken from the 10th edition of the Concise Oxford English Dictionary (1999):
edgy adj. tense, nervous, or irritable.
But almost any sample of lines from the Oxford English Corpus gives plenty of evidence of a second meaning as well:
Here was a clear case of a new sense of an existing word, and the dictionary entry needed to be updated to reflect this change. Here's the entry from the 11th edition of the Concise Oxford English Dictionary, 2004:
edgy adj. 1 tense, nervous, or irritable. 2 informal avant-garde and unconventional.
When Oxford lexicographers first started working with corpus data in the 1990s, the two words big and box had no special relationship with each other and big was no more or less likely to occur with box than large or small. As a compound, big box was most commonly to be found in structures such as 'big box of ...'. The most significant collocate for 'big box' in our 1990s evidence is 'chocolates'.
We see quite a different picture in the early 21st century, as this sample shows:
Most examples are found in North American English; the use is rare in British English. So this new dictionary entry was created (for the Concise Oxford English Dictionary, 11th edition revised, 2006):
big box n. N.Amer. informal a very large store which sells goods at discount prices, especially one specializing in a particular type of merchandize.
Using a corpus in modern lexicography is not only about tracking change. It's worth remembering that corpus lexicography is still a relatively new art. For hundreds of years, including most of the 20th century, lexicographers worked without enough evidence: certainly nothing comparable with corpus data and sometimes with no evidence at all except their own intuition. Even when evidence of usage was available, dictionary editors had no means of filtering or sorting large amounts of data efficiently and reliably. That was only possible once technological advances in the late 20th century allowed computers to manipulate and process very large texts.
A huge part of the benefit of corpus lexicography, therefore, is in uncovering facts about the language which are not new, but which have simply not been noticed before. Take a look at the following examples for the verb cause and the adjective vivacious.
The verb cause is common in English (the 99th most common verb in the Oxford English Corpus as a whole, with 192,899 occurrences) and it's likely to be part of every native speaker's active vocabulary.
Try this exercise: first, think of a few sentences containing the verb cause.
You might come up with examples like these, which were made up by people when they were asked to do the same exercise:
The car went out of control and caused an accident.
The interruption in service was caused by unexpected shutdown.
The virus caused an epidemic.
The meaning seems to be quite clear. Here is a typical definition, found in many dictionaries:
cause v. be the cause of; make happen.
However, looking at the corpus evidence reveals something else about cause, which is not mentioned in the definition.
What do you notice about the object (death, damage, chaos, disturbance, and so on) for the verb in each case? They are different words and phrases but they all share one thing in common: each is a 'bad' or a negative thing, as judged from the speaker's or writer's perspective. So the corpus tells us that things that are caused are generally harmful things such as accidents and disease.
As native speakers, we 'know' this intuitively by the way we use the word day to day and by the way – as demonstrated above – we readily produce 'artificial' examples. All that is left is the need to confirm our findings from the tiny sample examined above by checking them against the corpus as a whole. To do this, we use the Sketch Engine software to produce a 'word sketch' for a collocational profile for cause across all 192,899 examples:
The full picture for the objects of 'cause' confirms that harmful things are caused, while the list of subjects in the second column further confirms that harmful things are caused by harmful agents (virus, negligence, infection, vandal, etc.).
Finally, the dictionary entry can be written to reflect this more accurate view of the verb that we now have:
cause v. make (something, especially something bad) happen; be the cause of.
It's often said that English has a very rich vocabulary, with many synonyms and words to express similar ideas. Yet it is also often said that there are no 'true synonyms'. Every word listed in the same thesaurus entry, for example, will be distinguished from all the others by some difference – however small – in meaning or use. Corpus lexicography is a tool for drawing out distinctions between 'near synonyms'. By examining typical collocates we can establish how the profile for one word differs from another. We then show the results of this analysis in the dictionary entry.
The word vivacious is listed in a typical thesaurus entry alongside lively, animated, sprightly, spirited, and so on. Here are the definitions from a recent dictionary for these words:
full of high spirits and animation
full of vitality; lively and active
displaying animation, vigour, or liveliness
spirited or lively
What is noticeable is how similar the definitions are to each other (not to mention circular). They do not show how the words differ.
So what is it about vivacious that makes it different from the others? Vivacious occurs 680 times in the Oxford English Corpus. It occurs across all varieties of English and is well represented in the 'news' and 'fiction' components of the corpus. A profile of the collocates reveals what it is about this word that distinguishes it:
Typically, only certain types of noun are modified by the adjective vivacious. It seems that women and especially young women are vivacious, men and boys are not (nor animals, apparently). Furthermore, the use of the word vivacious conveys something more generally about a woman's attractiveness, hence the frequent collocation with other adjectives such as 'beautiful', 'young', and 'blonde'. The brief dictionary entry (from the 2nd edition of the Oxford Dictionary of English) tries to encapsulate some of the detail of this picture:
vivacious adj. (especially of a woman) attractively lively and animated.
Corpus lexicography is still in its infancy. The more we use the corpus, the more information we will find about words and about how language is really used. And using such evidence systematically in dictionary compilation means that we can produce better and more accurate dictionary entries.