The Oxford English Corpus
Oxford Dictionaries are continually monitoring and researching how language is evolving. The Oxford English Corpus is central to this process, and provides real evidence on which to base our language research.
What is a corpus?
A corpus is a collection of texts of written (or spoken) language presented in electronic form. It provides evidence of how language is used in real situations, which allows our editors to write accurate and meaningful entries. The Oxford English Corpus ensures that we can track and record the very latest developments in language today. By analysing the corpus and using special software, we can see words in context and find out how new words and senses are emerging, as well as spotting other trends in usage, spelling, world English, and so on.
The Oxford English Corpus is based mainly on material collected from pages on the World Wide Web (some printed texts, such as academic journals, have been used to supplement certain subject areas). It represents all types of English, from literary novels and specialist journals to everyday newspapers and magazines and from Hansard to the language of blogs, emails, and social media. And, as English is a global language, the Oxford English Corpus contains language from all parts of the world – not only from the UK and the United States but also from Ireland, Australia, New Zealand, the Caribbean, Canada, India, Singapore, and South Africa.
The extensive use of web pages has allowed us to build a corpus of unprecedented scale and variety – the corpus contains nearly 2.5 billion words of real 21st century English, with new text being continuously collected.
As the corpus develops and more text is added, it becomes possible to trace language change over time: words becoming more or less common, features spreading from one region to another, and the emergence of new meanings.
Oxford University Press grants research access to the Corpus for academic projects that can demonstrate a strong practical need for this data. To apply for research access to the Corpus, please fill in and return the application form.