Word frequency data

Corpus of Contemporary American English


 Purchase data 

Overview
Using the data
Compare 100k/60k

100,000 word list
  Samples
  Compare
  FAQ / questions

5,000-60,000 lemma lists
   Samples / formats
   Compare
   Free list (5,000)  

Spanish data
Portuguese data

Related sites
  Full-text data 
  Collocates
  N-grams
  WordAndPhrase
  Academic vocabulary
  corpus.byu.edu

Contact us


The 100,000 word list is the largest, carefully-corrected, frequency-based word list of English available anywhere. Take a look at 5,000 randomly-selected words from the list (every twentieth word, 1 to 100,000) to check the accuracy of the list. We believe that no other word list comes close is terms of size and accuracy.

For each of the 100,000 words, the spreadsheet shows more than fifty pieces of frequency data (see a sample: xlsx, xls), including the frequency in COCA, the BNC, SOAP, and COHA, as well as portions of each corpus (e.g. spoken, fiction, magazines, newspapers, and academic). There are also links in the spreadsheet that allow you to -- with one click -- see each of the 100,000 words in the online corpora.

And because the data is in an Excel spreadsheet, you can sort by, limit by, and compare any of these 50+ columns. For example, you could select only nouns or verbs or words with certain substrings (e.g. certain prefixes or suffixes). You could also sort by the frequency in BNC Academic, or COCA Magazines, or the ratio of COCA and the BNC (e.g. finding "American" or "British" words), or the ratio of COCA and COHA (e.g. "new" words in COCA). See some samples of the different kinds of searches that you can do.

[ Purchase data ]